Machine Learning: Multimodal Foundation Models

thebotcompany· Engineering
Apply Now ↗
📍 San FranciscoFullTime💰 USD 200K–350K/yr

About this role

The Bot Company

We're building a helpful robot for every home.

We're a small team of engineers, designers, and operators based in San Francisco. Our team comes from Tesla, Cruise, OpenAI, Google, Pixar, and many other great companies. In the past we've shipped to hundreds of millions of users and know what it takes to build amazing products and experiences.

Our team is deliberately lean to promote rapid decision making and do away with bureaucracy and hierarchy. Everyone is an IC and is empowered with massive scope, radical ownership, and direct responsibility. We work across the stack with a culture built for rapid iteration and fast execution.

What we look for in all candidates

All roles at The Bot Company demand extreme sharpness and the ability to move fast in high-intensity environments. Throughout the process, we expect candidates to demonstrate:

  • Exceptional mental acuity: you think quickly, learn instantly, and reason across unfamiliar domains.

  • Engineering curiosity: you naturally dig into how systems work, even outside your specialty.

  • High performance mindset: you move fast, handle ambiguity, and excel when the environment is demanding.

Machine Learning: Multimodal Foundation Models

We are building unified foundation models that natively reason across text, image, video, and kinematics to drive intelligent robotic policies.

You will work on large multi-modal networks and own the entire stack from data to training and deploying models.

What You'll Do

  • Build Native Multimodal Policies: Develop architectures where vision, language, and more modalities share a unified representation.

  • Improve Cross-Modal Reasoning: Research and implement methods to ensure the model doesn't just "associate" modalities but actually reasons through them (e.g., grounding visual physics in kinematic constraints).

  • Own the Training Loop End-to-End: Design, run, debug, and iterate on large-scale training experiments; diagnosing failure modes, improving data mixtures, and tightening evaluation to drive measurable gains.

  • Ship and Iterate on Real Systems: Integrate models into real robotic stacks, build on robot code to deploy your models, and optimize performance for edge inference.

Requirements

  • Very strong coding skills in Python, C++, or Rust.

  • Production MLLM Experience: Track record of training and deploying large-scale multimodal models.

  • Pretraining & RL Mastery: Deep intuition for LLM-style pretraining, post-training, and Reinforcement Learning at scale.

  • Infrastructure Fluency: Comfortable managing and optimizing large-scale experiments on massive GPU clusters.

Why Join

You’ll work with a small, elite team on challenges that require speed, intelligence, and deep engineering instinct. If you enjoy understanding systems at all levels, move fast, and think even faster, you’ll thrive here.

Frequently Asked Questions

What is the salary for the Machine Learning: Multimodal Foundation Models role at thebotcompany?
The listed salary for this Machine Learning: Multimodal Foundation Models position at thebotcompany is USD 200K–350K/yr. This is an FullTime role.
Where is the Machine Learning: Multimodal Foundation Models position at thebotcompany located?
This Machine Learning: Multimodal Foundation Models role at thebotcompany is based in San Francisco. The position is listed as on-site or hybrid. Check the full job description or apply directly to confirm the work arrangement.
Is the Machine Learning: Multimodal Foundation Models role at thebotcompany full-time or part-time?
This is listed as a FullTime position. It is posted as a Machine Learning: Multimodal Foundation Models role in the Engineering department at thebotcompany.
Which team or department does the Machine Learning: Multimodal Foundation Models at thebotcompany belong to?
This Machine Learning: Multimodal Foundation Models position is part of the Engineering department at thebotcompany. See the full job description for more information about the team structure and responsibilities.
How do I apply for the Machine Learning: Multimodal Foundation Models position at thebotcompany?
Click the "Apply Now" button on this page. You will be redirected to thebotcompany's official application portal hosted on ashby where you can submit your application directly.
When was the Machine Learning: Multimodal Foundation Models job at thebotcompany posted?
This Machine Learning: Multimodal Foundation Models position at thebotcompany was posted on Feb 25, 2026. Apply as soon as possible — early applications are often reviewed first.
Machine Learning: Multimodal Foundation Models
thebotcompany · 💰 USD 200K–350K/yr
Apply for this role ↗

You'll be redirected to thebotcompany's official application page on Ashby ATS.