Research Scientist / Engineer - Video Generation Modeling

rhoda-aiΒ· Research
Apply Now β†—
πŸ“ Palo AltoFullTime

About this role

At Rhoda AI, we’re building the next generation of generalist intelligent robots. We own the full robotics stack from high-performance hardware and robot systems to the infrastructure and state-of-the-art foundation world models that control our robots. Our robots are designed to be generalists capable of operating in complex, real-world environments and handling long-tail edge cases, made possible by our cutting edge research and end-to-end system design. We've raised over $400M and are investing aggressively in model research, infrastructure, hardware development, and manufacturing scale-up to make generalist robotics a reality.

We're looking for Research Scientists and Research Engineers to push the frontier of large-scale pre-training for our video action model. Our approach formulates robot control as video prediction β€” we pre-train causal video generation models on web-scale video data, then adapt them to predict robot actions from real-world demonstrations. You'll work on the core architectures, training objectives, and scaling strategies that determine how well our models learn from internet-scale video. We hire across levels β€” from senior to staff β€” and welcome both research-track and engineering-track candidates.

What You'll Do

  • Design and train large-scale causal video generation models on web-scale video data

  • Develop and validate training objectives, model architectures, and data mixtures for video prediction at scale

  • Research scaling laws and data efficiency for web-scale video pretraining

  • Investigate what properties of web video transfer most effectively to robotic control and action prediction

  • Build systematic evaluations to measure video generation quality, long-horizon prediction fidelity, and downstream robot task performance

  • Run rigorous ablations and benchmarking to understand what drives model quality at scale

  • Collaborate closely with data & evaluation, post-training, and training systems teams to translate research ideas into working systems

  • Publish and present work at top-tier ML and robotics venues (especially valued for RS track)

What We're Looking For

  • Strong background in large-scale generative modeling β€” either video generation (autoregressive video models, diffusion transformers, causal video architectures) or language model pretraining (LLMs, autoregressive transformers at scale)

  • Hands-on experience training large generative models from scratch at scale

  • Deep understanding of autoregressive modeling, causal architectures, and scaling behavior

  • Fluency with modern ML frameworks (PyTorch required; JAX a plus)

  • Ability to design experiments, interpret results, and iterate quickly

  • Strong research taste: ability to identify high-leverage questions and cut through noise

  • Comfort operating in a fast-moving, ambiguous startup environment

  • Staff-level candidates are expected to define technical direction and drive research strategy independently; senior/MTS candidates execute complex projects with strong fundamentals and growing scope

Nice to Have (But Not Required)

  • PhD in ML, CS, Robotics, or a related field β€” or equivalent research/industry experience

  • Strong publication record at NeurIPS, ICML, ICLR, CVPR, CoRL, etc. (especially valued for RS track)

  • Prior work specifically on video generation models (autoregressive video, diffusion transformers, world models, or causal video architectures)

  • Experience with large-scale autoregressive language model pretraining and scaling

  • Familiarity with web-scale video datasets and video data curation pipelines

  • Prior work connecting video generation to control, action prediction, or robotic learning

  • Familiarity with distributed training and multi-node infrastructure

Why This Role

  • Work on a fundamentally different approach to robot learning β€” web-scale video pretraining rather than robot-data-only VLA models

  • Your models give our robots the ability to understand and predict the visual world from internet-scale supervision

  • Direct collaboration with data, post-training, and deployment teams with no silos

  • High ownership and fast iteration in a small, elite team

Frequently Asked Questions

Is the salary disclosed for the Research Scientist / Engineer - Video Generation Modeling position at rhoda-ai?
The salary for this Research Scientist / Engineer - Video Generation Modeling role at rhoda-ai is not publicly listed. Click "Apply Now" to learn more about the compensation package on their official careers page.
Where is the Research Scientist / Engineer - Video Generation Modeling position at rhoda-ai located?
This Research Scientist / Engineer - Video Generation Modeling role at rhoda-ai is based in Palo Alto. The position is listed as on-site or hybrid. Check the full job description or apply directly to confirm the work arrangement.
Is the Research Scientist / Engineer - Video Generation Modeling role at rhoda-ai full-time or part-time?
This is listed as a FullTime position. It is posted as a Research Scientist / Engineer - Video Generation Modeling role in the Research department at rhoda-ai.
Which team or department does the Research Scientist / Engineer - Video Generation Modeling at rhoda-ai belong to?
This Research Scientist / Engineer - Video Generation Modeling position is part of the Research department at rhoda-ai. See the full job description for more information about the team structure and responsibilities.
How do I apply for the Research Scientist / Engineer - Video Generation Modeling position at rhoda-ai?
Click the "Apply Now" button on this page. You will be redirected to rhoda-ai's official application portal hosted on ashby where you can submit your application directly.
When was the Research Scientist / Engineer - Video Generation Modeling job at rhoda-ai posted?
This Research Scientist / Engineer - Video Generation Modeling position at rhoda-ai was posted on May 18, 2026. Apply as soon as possible β€” early applications are often reviewed first.
Research Scientist / Engineer - Video Generation Modeling
rhoda-ai
Apply for this role β†—

You'll be redirected to rhoda-ai's official application page on Ashby ATS.