At TensorWave, we're leading the charge in AI compute, building a versatile cloud platform that's driving the next generation of AI innovation. We're looking for an AI Infrastructure Engineer to support our vision by developing and managing the compute infrastructure that underpins our innovative AI cloud services.
Requirements
- Collaborate with a dynamic IT team to design, deploy, and maintain high-performance AI compute clusters supporting both AMD and NVIDIA GPU technologies.
- Lead initiatives to optimize cluster performance, resource utilization, and job scheduling to maximize efficiency across diverse AI workloads.
- Ensure system reliability, performance, and security for cloud services, implementing monitoring solutions and automated recovery systems.
- Troubleshoot and resolve complex infrastructure issues across Linux systems, networking, and distributed computing environments, providing expert guidance to maintain high service levels.
- Implement and maintain configuration management, deployment automation, and infrastructure-as-code practices.
Benefits
- Stock Options
- 100% paid Medical, Dental, and Vision insurance
- Life and Voluntary Supplemental Insurance
- Short Term Disability Insurance
- Flexible Spending Account
- 401(k)
- Flexible PTO
- Paid Holidays
- Parental Leave
- Mental Health Benefits through Spring Health