We seek a highly skilled and experienced Lead Machine Learning Engineer with extensive expertise in multimodal generative AI models, cross-modal architectures, and multimodal fusion techniques. The ideal candidate will not only have a strong technical background spanning text, vision, audio, and video modalities, but also the drive to mentor, educate, and advocate for the adoption of new and emerging technologies.

Requirements

7+ years experience in machine learning engineering, with at least 2+ years focussed on generative AI or multimodal systems
Proven experience developing and deploying multimodal generative AI systems with deep understanding of architectures that bridge multiple modalities (text-to-image, image-to-text, text-to-video, audio-visual models, etc.)
Strong expertise in vision models and architectures including diffusion models, vision transformers, and multimodal embeddings
Experience with large language models and their integration with visual and audio modalities
Experience with multimodal retrieval systems and vector databases
Hands-on experience with generative models across modalities including text generation, image synthesis, video generation, and audio/speech synthesis
Demonstrated ability to lead and mentor a team of machine learning engineers and data scientists, fostering a culture of innovation and technical excellence
Excellent communication and presentation skills, with the ability to articulate complex multimodal concepts clearly to both technical and non-technical audiences
Professional experience developing Python libraries for machine-learning applications. Strong background in PyTorch, HuggingFace Transformers/Diffusers, and specialized libraries (e.g., Stable Diffusion, OpenAI CLIP, timm, torchaudio, torchvision)
Strong problem-solving skills and the ability to think critically and creatively about novel multimodal applications

Requirements

7+ years experience in machine learning engineering, with at least 2+ years focussed on generative AI or multimodal systems

Proven experience developing and deploying multimodal generative AI systems with deep understanding of architectures that bridge multiple modalities (text-to-image, image-to-text, text-to-video, audio-visual models, etc.)

Strong expertise in vision models and architectures including diffusion models, vision transformers, and multimodal embeddings

Experience with large language models and their integration with visual and audio modalities

Experience with multimodal retrieval systems and vector databases

Hands-on experience with generative models across modalities including text generation, image synthesis, video generation, and audio/speech synthesis

Demonstrated ability to lead and mentor a team of machine learning engineers and data scientists, fostering a culture of innovation and technical excellence

Excellent communication and presentation skills, with the ability to articulate complex multimodal concepts clearly to both technical and non-technical audiences

Professional experience developing Python libraries for machine-learning applications. Strong background in PyTorch, HuggingFace Transformers/Diffusers, and specialized libraries (e.g., Stable Diffusion, OpenAI CLIP, timm, torchaudio, torchvision)

Strong problem-solving skills and the ability to think critically and creatively about novel multimodal applications

C&T - Lead Engineer, AI/ML (India)

About the Company

Job Description

Requirements

Similar Jobs

C&T - Lead Engineer, AI/ML (India)

Senior Engineer, AI/ML (India)

Solution Architect (India)

C&T - Lead Engineer, AI/ML (India)

About the Company

Job Description

Requirements

Similar Jobs

C&T - Lead Engineer, AI/ML (India)

Senior Engineer, AI/ML (India)

Solution Architect (India)

Job Details

About Code and Theory