- Design and implement modern data engineering tools to manage petabytes of data using offline/online data integrations.
- Execution of entire data engineering life cycle steps across phases of Problem Formulation, Data acquisition and assessment, Data Ingestion, Feature selection and engineering, Data Model development and fine-tuning, performance measurement, right up to the delivery of Consumption Module (Application/ Dashboard/ API)
- Experience with Data Warehousing, Data Lake, Analytic processes, and methodologies.
- Proficient in writing and optimizing SQL queries and other procedures/scripts
- Build highly optimized and scalable data pipelines (ETL) using batch and stream processing frameworks (Spark)
- Strong knowledge of data integration (ETL/ELT), data quality and multi-dimensional
- Build monitoring and alerting dashboards to identify operational issues in data pipelines
- Optimize data and compute capacity resources to meet on-demand infrastructure scaling needs, improve cluster utilization and meet application SLAs
- Work on security compliance and contribute to data and service access controls
- Implement best practices for disaster recovery and service reliability
- Amazon Redshift, AWS Glue, AWS CloudTrail, Python, SQL Amazon Kinesis Data Streams, S3 Standard, Amazon S3 Glacier, AWS Key Management Service, No SQL , SQL