We are looking for a Data Engineer to build reliable, scalable data pipelines and contribute to the core data ecosystem that powers analytics, AI/ML, and emerging Generative AI use cases.
Requirements
- Build and maintain batch and streaming data pipelines with strong emphasis on reliability, performance, and efficient cost usage.
- Develop SQL, Python, and Spark/PySpark transformations to support analytics, reporting, and ML workloads.
- Contribute to data model design and ensure datasets adhere to high standards of quality, structure, and governance.
- Support integrations with internal and external systems, ensuring accuracy and resilience of data flows.
- Build and maintain data flows that support GenAI workloads (e.g., embedding generation, vector pipelines, data preparation for LLM training and inference).
- Collaborate with ML/GenAI teams to enable high-quality training and inference datasets.
- Contribute to the development of retrieval pipelines, enrichment workflows, or AI-powered data quality checks.
- Work with Data Science, Analytics, Product, and Engineering teams to translate data requirements into reliable solutions.
- Participate in design reviews and provide input toward scalable and maintainable engineering practices.
- Uphold strong data quality, testing, and documentation standards.
- Support deployments, troubleshooting, and operational stability of the pipelines you own.
Benefits
- Competitive salary
- Generous Paid Time Off
- 401k Matching
- Retirement Plan
- Relocation Assistance
- Tuition Reimbursement
- Visa Sponsorship