This is a remote full-time job for an HPC Solutions Architect to design and tune next-generation platforms for AI training, large simulations, and data-heavy workloads. The role involves working with NVIDIA's latest hardware and collaborating with customers to create efficient and scalable HPC environments.
Requirements
- A Bachelor's or Master's in Computer Science, Engineering, or a related field
- 3+ years building or running HPC or large GPU clusters
- Strong Linux background, Kubernetes, and container runtimes
- HPC networking and RDMA: InfiniBand, RoCE, NVLink/NVSwitch
- Experience with storage and I/O for big workloads: Ceph, Lustre, NFS
- Comfort with Terraform, Ansible, Helm, and GitOps-style workflows
Benefits
- 100% employer-paid medical, dental, and vision
- 4% 401(k) match with immediate vesting
- Company-paid short- and long-term disability and life insurance
- 20 weeks paid parental leave for primary caregivers
- 12 weeks paid parental leave for secondary caregivers