We are a B2B WealthTech startup based in Abu Dhabi and backed by BNY Mellon and Lunate. We're building a team that owns production incident response, deep debugging, and permanent fixes across application, data, and deployment layers.
Requirements
- 7+ years in SRE / Production Engineering / Platform Engineering (reliability-focused)
- Strong Go (mandatory): ability to read, debug, and ship production fixes in Go codebases
- Proven experience debugging distributed systems in production (latency, error rates, timeouts, retries, cascading failures)
- Strong hands-on experience with Kubernetes in production environments
- Experience with Helm and GitOps workflows (FluxCD preferred; ArgoCD acceptable)
- Solid PostgreSQL troubleshooting experience (performance, incident patterns, migrations)
- Observability experience (metrics/logging/tracing; Datadog/Grafana/Tempo/Loki experience is a plus)
- Strong incident leadership: calm under pressure, clear communication, structured problem-solving
- Engineering hygiene: PR discipline, reviews, testing mindset, safe rollouts/rollbacks
- Comfortable with IAM/security fundamentals in real production systems: OAuth2/OIDC basics, RBAC/least privilege, and safe secrets handling
Benefits
- Generous Paid Time Off
- 401k Matching
- Retirement Plan