AI Researcher — Inference Optimization

featherlessai· Research
Apply Now ↗
🌍 Remote📍 Remote (world)FullTime

About this role

Role Overview

We are seeking an AI Researcher with deep experience in inference optimization to design, evaluate, and deploy high-performance inference systems for large-scale machine learning models. You will work at the intersection of model architecture, systems engineering, and hardware-aware optimization, improving latency, throughput, and cost efficiency across real-world production environments.

Key Responsibilities

  • Research and develop techniques to optimize inference performance for large neural networks.

  • Improve latency, throughput, memory efficiency, and cost per inference.

  • Design and evaluate model-level optimizations (quantization, pruning, KV-cache optimization, architecture-aware simplifications).

  • Implement systems-level optimizations (dynamic batching, kernel fusion, multi-GPU inference, prefill vs decode optimization).

  • Benchmark inference workloads across hardware accelerators.

  • Collaborate with engineering teams to deploy optimized inference pipelines.

  • Translate research insights into production-ready improvements.

Required Qualifications

  • Strong background in machine learning, deep learning, or AI systems.

  • Hands-on experience optimizing inference for large-scale models.

  • Proficiency in Python and modern ML frameworks (e.g., PyTorch).

  • Experience with inference tooling (e.g., Triton, TensorRT, vLLM, ONNX Runtime).

  • Ability to design experiments and communicate results clearly.

Preferred / Nice-to-Have Qualifications

  • Experience deploying production inference systems at scale.

  • Familiarity with distributed and multi-GPU inference.

  • Experience contributing to open-source ML or inference frameworks.

  • Authorship or co-authorship of peer-reviewed research papers in machine learning, systems, or related fields.

  • Experience working close to hardware (CUDA, ROCm, profiling tools).

What Success Looks Like

  • Measurable gains in latency, throughput, and cost efficiency.

  • Optimized inference systems running reliably in production.

  • Research ideas successfully translated into deployable systems.

  • Clear benchmarks and documentation that inform product decisions.

Relevant Research Areas (Bonus)

  • Long-context inference optimization

  • Speculative decoding

  • KV-cache compression and paging

  • Efficient decoding strategies

  • Hardware-aware inference design

Frequently Asked Questions

Is the salary disclosed for the AI Researcher — Inference Optimization position at featherlessai?
The salary for this AI Researcher — Inference Optimization role at featherlessai is not publicly listed. Click "Apply Now" to learn more about the compensation package on their official careers page.
Is the AI Researcher — Inference Optimization job at featherlessai remote?
Yes, this AI Researcher — Inference Optimization position at featherlessai is remote, with team members based in Remote (world). You can work from home or anywhere in the supported regions.
Is the AI Researcher — Inference Optimization role at featherlessai full-time or part-time?
This is listed as a FullTime position. It is posted as a AI Researcher — Inference Optimization role in the Research department at featherlessai.
Which team or department does the AI Researcher — Inference Optimization at featherlessai belong to?
This AI Researcher — Inference Optimization position is part of the Research department at featherlessai. See the full job description for more information about the team structure and responsibilities.
How do I apply for the AI Researcher — Inference Optimization position at featherlessai?
Click the "Apply Now" button on this page. You will be redirected to featherlessai's official application portal hosted on ashby where you can submit your application directly.
When was the AI Researcher — Inference Optimization job at featherlessai posted?
This AI Researcher — Inference Optimization position at featherlessai was posted on Jan 23, 2026. Apply as soon as possible — early applications are often reviewed first.
AI Researcher — Inference Optimization
featherlessai
Apply for this role ↗

You'll be redirected to featherlessai's official application page on Ashby ATS.