kaiko.ai is building a next-generation agentic clinical AI assistant that helps clinicians reason across patient data, guidelines, and diagnostics. The company is seeking a Junior Research Data ML Engineer to design and implement data-sourcing, synthetic-generation, and curation pipelines for its Multimodal Large Language Model (MLLM).
Requirements
- Strong programming skills in Python and familiarity with distributed frameworks such as Ray or Spark
- Experience contributing to ML research and associated data challenges, such as data cleaning, transformation and validation
- Exposure to synthetic-data generation workflows or interest in working with LLM-related data pipeline
- Understanding of lakehouse paradigms (Delta, Iceberg) and columnar formats (Parquet, ORC)
- Experience with core data-processing primitives (hashing, deduplication, chunking etc.) and associated scalability/performance trade-offs
- Strong communication skills and the ability to present experimental results and technical concepts clearly and concisely
Benefits
- Attractive and competitive salary
- Good pension plan
- 25 vacation days per year
- Great offsites and team events
- EUR 1000 learning and development budget
- Autonomy to do work the way that works best
- Annual commuting subsidy