This role involves building scalable data pipelines using AWS and Databricks services, integrating data from various sources, transforming and processing data in various formats, and managing Databricks clusters for performance and cost efficiency. The data engineer will collaborate with various stakeholders to deliver data-driven solutions.
Requirements
- Design, implement, and manage scalable ETL/ELT pipelines using AWS services and Databricks.
- Ingest and process structured, semi-structured, and unstructured data into AWS Data Lake or Databricks.
- Develop advanced data processing workflows using PySpark, Databricks SQL, or Scala.
- Configure and optimize Databricks clusters, notebooks, and jobs for performance and cost efficiency.
- Design and implement solutions leveraging AWS-native services like S3, Glue, Redshift, EMR, Lambda, Kinesis, and Athena.
- Ensure data pipelines are secure, robust, and monitored using CloudWatch, Datadog, or equivalent tools.