We're looking for a Researcher: Multimodal to conduct cutting-edge research at the intersection of machine learning, multimodal data, and generative modeling to advance the state of AI across audio, text, vision, and other modalities.

Requirements

Expertise in machine learning, multimodal learning, and generative modeling, with a strong research track record in top-tier conferences (e.g., CVPR, ICML, NeurIPS, ICCV)
Proficiency in deep learning frameworks such as PyTorch or TensorFlow, with experience in handling diverse data modalities (e.g., audio, video, text)
Strong understanding of state-of-the-art techniques for multimodal modeling, such as autoregressive and diffusion modeling, and deep understanding of architectural tradeoffs

Benefits

Lunch, dinner and snacks at the office
Fully covered medical, dental, and vision insurance for employees
401(k)
Relocation and immigration support
Your own personal Yoshi

Requirements

Expertise in machine learning, multimodal learning, and generative modeling, with a strong research track record in top-tier conferences (e.g., CVPR, ICML, NeurIPS, ICCV)
Proficiency in deep learning frameworks such as PyTorch or TensorFlow, with experience in handling diverse data modalities (e.g., audio, video, text)
Strong understanding of state-of-the-art techniques for multimodal modeling, such as autoregressive and diffusion modeling, and deep understanding of architectural tradeoffs

Benefits

Lunch, dinner and snacks at the office
Fully covered medical, dental, and vision insurance for employees
401(k)
Relocation and immigration support
Your own personal Yoshi

Researcher: Multimodal

About the Company

Job Description

Requirements

Benefits

Similar Jobs

Researcher: Multimodal

Researcher: Model Architecture

Research Scientist - Voice AI Foundations

Researcher: Multimodal

About the Company

Job Description

Requirements

Benefits

Similar Jobs

Researcher: Multimodal

Researcher: Model Architecture

Research Scientist - Voice AI Foundations

Job Details

About Cartesia