Machine Learning Engineer — Inference Optimization

featherlessai· Research
Apply Now ↗
🌍 Remote📍 Remote (world)FullTime

About this role

About the Role

We’re looking for a Machine Learning Engineer to own and push the limits of model inference performance at scale. You’ll work at the intersection of research and production—turning cutting-edge models into fast, reliable, and cost-efficient systems that serve real users.

This role is ideal for someone who enjoys deep technical work, profiling systems down to the kernel/GPU level, and translating research ideas into production-grade performance gains.

What You’ll Do

  • Optimize inference latency, throughput, and cost for large-scale ML models in production

  • Profile and bottleneck GPU/CPU inference pipelines (memory, kernels, batching, IO)

  • Implement and tune techniques such as:

    • Quantization (fp16, bf16, int8, fp8)

    • KV-cache optimization & reuse

    • Speculative decoding, batching, and streaming

    • Model pruning or architectural simplifications for inference

  • Collaborate with research engineers to productionize new model architectures

  • Build and maintain inference-serving systems (e.g. Triton, custom runtimes, or bespoke stacks)

  • Benchmark performance across hardware (NVIDIA / AMD GPUs, CPUs) and cloud setups

  • Improve system reliability, observability, and cost efficiency under real workloads

What We’re Looking For

  • Strong experience in ML inference optimization or high-performance ML systems

  • Solid understanding of deep learning internals (attention, memory layout, compute graphs)

  • Hands-on experience with PyTorch (or similar) and model deployment

  • Familiarity with GPU performance tuning (CUDA, ROCm, Triton, or kernel-level optimizations)

  • Experience scaling inference for real users (not just research benchmarks)

  • Comfortable working in fast-moving startup environments with ownership and ambiguity

Nice to Have

  • Experience with LLM or long-context model inference

  • Knowledge of inference frameworks (TensorRT, ONNX Runtime, vLLM, Triton)

  • Experience optimizing across different hardware vendors

  • Open-source contributions in ML systems or inference tooling

  • Background in distributed systems or low-latency services

Why Join Us

  • Real ownership over performance-critical systems

  • Direct impact on product reliability and unit economics

  • Close collaboration with research, infra, and product

  • Competitive compensation + meaningful equity at Series A

  • A team that cares about engineering quality, not hype

Frequently Asked Questions

Is the salary disclosed for the Machine Learning Engineer — Inference Optimization position at featherlessai?
The salary for this Machine Learning Engineer — Inference Optimization role at featherlessai is not publicly listed. Click "Apply Now" to learn more about the compensation package on their official careers page.
Is the Machine Learning Engineer — Inference Optimization job at featherlessai remote?
Yes, this Machine Learning Engineer — Inference Optimization position at featherlessai is remote, with team members based in Remote (world). You can work from home or anywhere in the supported regions.
Is the Machine Learning Engineer — Inference Optimization role at featherlessai full-time or part-time?
This is listed as a FullTime position. It is posted as a Machine Learning Engineer — Inference Optimization role in the Research department at featherlessai.
Which team or department does the Machine Learning Engineer — Inference Optimization at featherlessai belong to?
This Machine Learning Engineer — Inference Optimization position is part of the Research department at featherlessai. See the full job description for more information about the team structure and responsibilities.
How do I apply for the Machine Learning Engineer — Inference Optimization position at featherlessai?
Click the "Apply Now" button on this page. You will be redirected to featherlessai's official application portal hosted on ashby where you can submit your application directly.
When was the Machine Learning Engineer — Inference Optimization job at featherlessai posted?
This Machine Learning Engineer — Inference Optimization position at featherlessai was posted on Jan 22, 2026. Apply as soon as possible — early applications are often reviewed first.
Machine Learning Engineer — Inference Optimization
featherlessai
Apply for this role ↗

You'll be redirected to featherlessai's official application page on Ashby ATS.