This AWS/Azure Data Engineer position focuses on designing, implementing, and managing scalable data pipelines. The role involves data ingestion, transformation, and integration from diverse sources, using services like AWS and Databricks. The candidate will also be responsible for setting up Databricks clusters and ensuring data pipeline performance, security, and monitoring.
Requirements
- Design, implement, and manage scalable ETL/ELT pipelines using AWS services and Databricks.
- Ingest and process structured, semi-structured, and unstructured data from multiple sources into AWS Data Lake or Databricks.
- Develop advanced data processing workflows using PySpark, Databricks SQL, or Scala.
- Configure and optimize Databricks clusters, notebooks, and jobs for performance and cost efficiency.
- Design and implement solutions leveraging AWS-native services like S3, Glue, Redshift, EMR, Lambda, Kinesis, and Athena.
- Work closely with Data Analysts, Data Scientists, and other Engineers to understand business requirements.
- Optimize data pipelines, storage, and queries for performance, scalability, and reliability.
- Ensure data pipelines are secure, robust, and monitored using CloudWatch, Datadog, or equivalent tools.
- Maintain clear and concise documentation for data pipelines, workflows, and architecture.