Machine Learning Infrastructure Engineer

mindrobotics· Software Engineering
Apply Now ↗
📍 Palo AltoFullTime

About this role

The Role

At Mind Robotics, we’re building generalized physical AI—robotic systems capable of dexterous, adaptive, and reasoning-intensive work in real-world industrial environments. Our ability to iterate quickly on large-scale models depends on world-class ML infrastructure.

We’re looking for a Machine Learning Infrastructure Engineer to build the core systems that enable fast, reliable, and scalable model training—powering everything from experimentation to production deployment.

Responsibilities

  • Design and implement scalable systems for training large ML models

  • Enable efficient workflows for data ingestion, training, and iteration

  • Develop and optimize distributed training systems across hundreds of GPUs

  • Implement strategies for parallelization, sharding, and efficient compute utilization

  • Improve training efficiency through techniques such as attention optimizations, kernel fusion, and memory management

  • Partner closely with modeling teams to accelerate iteration speed and reduce training costs

  • Build internal tools for experiment tracking, monitoring, and debugging

  • Implement systems for tracking training performance, failures, and resource utilization

  • Debug and resolve bottlenecks across the training stack

  • Provide lightweight infrastructure support for deploying and running models in production environments

  • Optimize inference performance and reliability where needed

  • Support core cloud infrastructure needs for training workloads (without heavy DevOps overhead)

  • Manage compute resources efficiently across training jobs

Qualifications

  • Strong experience building infrastructure for large-scale ML training

  • Deep understanding of how modern LLM/VLM systems are trained and scaled

  • Proven experience setting up and scaling distributed training across hundreds of GPUs

  • Strong understanding of parallelization strategies (data, model, pipeline parallelism)

  • Strong proficiency in Python programming

  • Expert-level proficiency in PyTorch and/or JAX

  • Strong understanding of techniques like attention optimization, kernel fusion, and efficient memory usage

Nice to Have

  • Experience supporting inference systems in production

  • Familiarity with robotics or embodied AI workloads

  • Experience building tools for experiment management and researcher productivity

Frequently Asked Questions

Is the salary disclosed for the Machine Learning Infrastructure Engineer position at mindrobotics?
The salary for this Machine Learning Infrastructure Engineer role at mindrobotics is not publicly listed. Click "Apply Now" to learn more about the compensation package on their official careers page.
Where is the Machine Learning Infrastructure Engineer position at mindrobotics located?
This Machine Learning Infrastructure Engineer role at mindrobotics is based in Palo Alto. The position is listed as on-site or hybrid. Check the full job description or apply directly to confirm the work arrangement.
Is the Machine Learning Infrastructure Engineer role at mindrobotics full-time or part-time?
This is listed as a FullTime position. It is posted as a Machine Learning Infrastructure Engineer role in the Software Engineering department at mindrobotics.
Which team or department does the Machine Learning Infrastructure Engineer at mindrobotics belong to?
This Machine Learning Infrastructure Engineer position is part of the Software Engineering department at mindrobotics. See the full job description for more information about the team structure and responsibilities.
How do I apply for the Machine Learning Infrastructure Engineer position at mindrobotics?
Click the "Apply Now" button on this page. You will be redirected to mindrobotics's official application portal hosted on ashby where you can submit your application directly.
When was the Machine Learning Infrastructure Engineer job at mindrobotics posted?
This Machine Learning Infrastructure Engineer position at mindrobotics was posted on Jan 26, 2026. Apply as soon as possible — early applications are often reviewed first.
Machine Learning Infrastructure Engineer
mindrobotics
Apply for this role ↗

You'll be redirected to mindrobotics's official application page on Ashby ATS.