We're building the platform layer that powers real AI features for millions of users. Our mission is to connect Large Language Models (LLMs) with e-commerce intelligence and make them secure, observable, scalable, and cost-efficient in production.
Requirements
- Build and operate Infrastructure as Code (CDK/Terraform/CloudFormation) for reproducible environments.
- Provide platform primitives: model/endpoint gateway, prompt/agent registry, configuration & feature flags, rollout strategies (canary, A/B), rate limiting & quotas.
- Create and maintain RAG infrastructure (vector stores, indexing pipelines, retrieval services) with observability & SLOs.
- Define and monitor SLOs and implement latency/cost controls (caching, batching, routing).
- Integrate Foundation Model providers (AWS Bedrock, OpenAI, Azure AI) and manage versioning/rollbacks.
- Enable evaluation gates (with MLEs): wire up eval pipelines and quality signals (judges, test sets) into CI/CD.
- Drive security, compliance, and monitoring (logs, traces, metrics, alerts, incident response).
Benefits
- Paid time off
- 401k matching
- Retirement plan
- Visa sponsorship
- Relocation assistance
- Fitness and sports options
- Free lunch and snacks