Machine Learning Engineer — Training Optimization

featherlessai· Research
Apply Now ↗
🌍 Remote📍 Remote (world)FullTime

About this role

About the Role

We’re looking for an ML Engineer focused on training optimization to help us scale and improve large-scale model training. You’ll work at the intersection of research and production, optimizing training pipelines for speed, stability, and cost—while collaborating closely with researchers pushing model architecture and capability forward.

This is a high-impact role with real ownership: your work directly affects how fast we can iterate, how large we can scale, and how efficiently we deploy new models.

What You’ll Do

  • Optimize large-scale model training pipelines (throughput, convergence, stability, and cost)

  • Improve distributed training strategies (data, model, and pipeline parallelism)

  • Tune optimizers, schedulers, batch sizing, and precision (bf16 / fp16 / fp8)

  • Reduce training time and compute cost via profiling, bottleneck analysis, and systems-level improvements

  • Collaborate with researchers on architecture-aware training strategies

  • Build and maintain robust training infrastructure (checkpointing, fault tolerance, reproducibility)

  • Evaluate and integrate new training techniques (e.g. gradient checkpointing, ZeRO, FSDP, custom kernels)

  • Own training performance metrics and continuously push them forward

What We’re Looking For

  • Strong experience training large neural networks (LLMs or similarly large models)

  • Hands-on experience with training optimization (not just model usage)

  • Solid understanding of:

    • Backpropagation, optimization algorithms, and training dynamics

    • Distributed systems for ML training

  • Experience with PyTorch (required)

  • Comfort working close to hardware (GPUs, memory, networking constraints)

  • Ability to move fluidly between research ideas and production-ready code

Nice to Have

  • Experience with large-scale distributed training (multi-node, multi-GPU)

  • Familiarity with DeepSpeed, FSDP, Megatron, or custom training stacks

  • Experience optimizing training on AMD or NVIDIA GPUs

  • Contributions to open-source ML infrastructure or research codebases

  • Exposure to non-Transformer architectures (RNNs, hybrid models, etc.)

Why Join Us

  • Real ownership at Series-A stage — your work shapes the company’s trajectory

  • Work on cutting-edge models and training systems at scale

  • Small, highly technical team with fast feedback loops

  • Strong emphasis on engineering quality and research rigor

  • Competitive compensation + meaningful equity

Frequently Asked Questions

Is the salary disclosed for the Machine Learning Engineer — Training Optimization position at featherlessai?
The salary for this Machine Learning Engineer — Training Optimization role at featherlessai is not publicly listed. Click "Apply Now" to learn more about the compensation package on their official careers page.
Is the Machine Learning Engineer — Training Optimization job at featherlessai remote?
Yes, this Machine Learning Engineer — Training Optimization position at featherlessai is remote, with team members based in Remote (world). You can work from home or anywhere in the supported regions.
Is the Machine Learning Engineer — Training Optimization role at featherlessai full-time or part-time?
This is listed as a FullTime position. It is posted as a Machine Learning Engineer — Training Optimization role in the Research department at featherlessai.
Which team or department does the Machine Learning Engineer — Training Optimization at featherlessai belong to?
This Machine Learning Engineer — Training Optimization position is part of the Research department at featherlessai. See the full job description for more information about the team structure and responsibilities.
How do I apply for the Machine Learning Engineer — Training Optimization position at featherlessai?
Click the "Apply Now" button on this page. You will be redirected to featherlessai's official application portal hosted on ashby where you can submit your application directly.
When was the Machine Learning Engineer — Training Optimization job at featherlessai posted?
This Machine Learning Engineer — Training Optimization position at featherlessai was posted on Jan 22, 2026. Apply as soon as possible — early applications are often reviewed first.
Machine Learning Engineer — Training Optimization
featherlessai
Apply for this role ↗

You'll be redirected to featherlessai's official application page on Ashby ATS.