We seek a highly skilled and experienced Lead Machine Learning Engineer with extensive expertise in multimodal generative AI models, cross-modal architectures, and multimodal fusion techniques. The ideal candidate will not only have a strong technical background spanning text, vision, audio, and video modalities, but also the drive to mentor, educate, and advocate for the adoption of new and emerging technologies.
Requirements
- 7+ years experience in machine learning engineering, with at least 2+ years focussed on generative AI or multimodal systems
- Proven experience developing and deploying multimodal generative AI systems with deep understanding of architectures that bridge multiple modalities (text-to-image, image-to-text, text-to-video, audio-visual models, etc.)
- Strong expertise in vision models and architectures including diffusion models, vision transformers, and multimodal embeddings
- Experience with large language models and their integration with visual and audio modalities
- Experience with multimodal retrieval systems and vector databases
- Hands-on experience with generative models across modalities including text generation, image synthesis, video generation, and audio/speech synthesis
- Demonstrated ability to lead and mentor a team of machine learning engineers and data scientists, fostering a culture of innovation and technical excellence
- Excellent communication and presentation skills, with the ability to articulate complex multimodal concepts clearly to both technical and non-technical audiences
- Professional experience developing Python libraries for machine-learning applications. Strong background in PyTorch, HuggingFace Transformers/Diffusers, and specialized libraries (e.g., Stable Diffusion, OpenAI CLIP, timm, torchaudio, torchvision)
- Strong problem-solving skills and the ability to think critically and creatively about novel multimodal applications