Design and implement modern data engineering tools to manage petabytes of data using offline/online data integrations.
Execution of entire data engineering life cycle steps across phases of Problem Formulation, Data acquisition and assessment, Data Ingestion, Feature selection and engineering, Data Model development and fine-tuning, performance measurement, right up to the delivery of Consumption Module (Application/ Dashboard/ API)
Experience with Data Warehousing, Data Lake, Analytic processes, and methodologies.
Proficient in writing and optimizing SQL queries and other procedures/scripts
Build highly optimized and scalable data pipelines (ETL) using batch and stream processing frameworks (Spark)
Strong knowledge of data integration (ETL/ELT), data quality and multi-dimensional
Build monitoring and alerting dashboards to identify operational issues in data pipelines
Optimize data and compute capacity resources to meet on-demand infrastructure scaling needs, improve cluster utilization and meet application SLAs
Work on security compliance and contribute to data and service access controls
Implement best practices for disaster recovery and service reliability