Research Engineer - Training Platform

rhoda-aiΒ· Research
Apply Now β†—
πŸ“ Palo AltoFullTime

About this role

At Rhoda AI, we’re building the next generation of generalist intelligent robots. We own the full robotics stack from high-performance hardware and robot systems to the infrastructure and state-of-the-art foundation world models that control our robots. Our robots are designed to be generalists capable of operating in complex, real-world environments and handling long-tail edge cases, made possible by our cutting edge research and end-to-end system design. We've raised over $400M and are investing aggressively in model research, infrastructure, hardware development, and manufacturing scale-up to make generalist robotics a reality.

We're looking for a Research Engineer to build and maintain the training platform that powers our model development β€” experiment orchestration, job management, observability, and the tooling that lets researchers move from idea to result as fast as possible.

What You'll Do

  • Build and maintain training orchestration systems for large-scale distributed model training across GPU clusters

  • Develop experiment management tooling: job configuration, tracking, reproducibility, and artifact management

  • Build observability infrastructure for training runs: loss curves, compute utilization, gradient statistics, and anomaly detection

  • Optimize and automate the research iteration loop from experiment launch to results analysis

  • Manage job scheduling and cluster utilization for efficient use of GPU compute

  • Build internal tooling and interfaces that help researchers move faster

  • Collaborate with training systems, data infrastructure, and research teams to support their platform needs

What We're Looking For

  • Strong software engineering skills with experience in MLOps or ML platform engineering

  • Familiarity with distributed training frameworks (PyTorch DDP, FSDP, DeepSpeed, Megatron, or similar)

  • Experience building experiment tracking, reproducibility, and artifact management systems

  • Comfortable managing and operating GPU cluster environments (Slurm, Kubernetes, or similar)

  • Strong reliability engineering instincts: monitoring, alerting, and failure recovery

Nice to Have (But Not Required)

  • Experience with training orchestration tools (Slurm, Ray, Kubernetes, or similar schedulers)

  • Familiarity with experiment tracking tools (Weights & Biases, MLflow, or custom solutions)

  • Experience supporting large model training pipelines (LLMs, VLMs, or video models)

  • Understanding of parallelism strategies and how they affect training efficiency and debugging

  • Experience with cloud-based training infrastructure (AWS, GCP, or Azure)

Why This Role

  • Your platform is the daily tool every researcher and engineer uses to train models

  • Improvements to training velocity and reliability compound across every experiment the team runs

  • High visibility with direct feedback from researchers and ML engineers

  • Build systems that scale from today's models to future frontier training runs

Frequently Asked Questions

Is the salary disclosed for the Research Engineer - Training Platform position at rhoda-ai?
The salary for this Research Engineer - Training Platform role at rhoda-ai is not publicly listed. Click "Apply Now" to learn more about the compensation package on their official careers page.
Where is the Research Engineer - Training Platform position at rhoda-ai located?
This Research Engineer - Training Platform role at rhoda-ai is based in Palo Alto. The position is listed as on-site or hybrid. Check the full job description or apply directly to confirm the work arrangement.
Is the Research Engineer - Training Platform role at rhoda-ai full-time or part-time?
This is listed as a FullTime position. It is posted as a Research Engineer - Training Platform role in the Research department at rhoda-ai.
Which team or department does the Research Engineer - Training Platform at rhoda-ai belong to?
This Research Engineer - Training Platform position is part of the Research department at rhoda-ai. See the full job description for more information about the team structure and responsibilities.
How do I apply for the Research Engineer - Training Platform position at rhoda-ai?
Click the "Apply Now" button on this page. You will be redirected to rhoda-ai's official application portal hosted on ashby where you can submit your application directly.
When was the Research Engineer - Training Platform job at rhoda-ai posted?
This Research Engineer - Training Platform position at rhoda-ai was posted on May 18, 2026. Apply as soon as possible β€” early applications are often reviewed first.
Research Engineer - Training Platform
rhoda-ai
Apply for this role β†—

You'll be redirected to rhoda-ai's official application page on Ashby ATS.