Required Qualifications:

  • Design and implement modern data engineering tools to manage petabytes of data using offline/online data integrations.
  • Execution of entire data engineering life cycle steps across phases of Problem Formulation, Data acquisition and assessment, Data Ingestion, Feature selection and engineering, Data Model development and fine-tuning, performance measurement, right up to the delivery of Consumption Module (Application/ Dashboard/ API)
  • Experience with Data Warehousing, Data Lake, Analytic processes, and methodologies.
  • Proficient in writing and optimizing SQL queries and other procedures/scripts
  • Build highly optimized and scalable data pipelines (ETL) using batch and stream processing frameworks (Spark)
  • Strong knowledge of data integration (ETL/ELT), data quality and multi-dimensional
  • Build monitoring and alerting dashboards to identify operational issues in data pipelines
  • Optimize data and compute capacity resources to meet on-demand infrastructure scaling needs, improve cluster utilization and meet application SLAs
  • Work on security compliance and contribute to data and service access controls
  • Implement best practices for disaster recovery and service reliability

Mandatory skills:

  • Amazon Redshift, AWS Glue, AWS CloudTrail, Python, SQL Amazon Kinesis Data Streams, S3 Standard, Amazon S3 Glacier, AWS Key Management Service, No SQL , SQL