Member of Technical Staff, Inference

inferact· Research & Engineering
Apply Now ↗
📍 San FranciscoFullTime💰 USD 200K–400K/yr

About this role

Inferact's mission is to grow vLLM as the world's AI inference engine and accelerate AI progress by making inference cheaper and faster. Founded by the creators and core maintainers of vLLM, we sit at the intersection of models and hardware—a position that took years to build.

About the Role

We're looking for an inference runtime engineer to push the boundaries of what's possible in LLM and diffusion model serving. Models grow larger. Architectures shift: mixture-of-experts, multimodal, agentic. Every breakthrough demands innovations on the inference engine itself. You'll work at the core of vLLM, optimizing how models execute across diverse hardware and architectures. Your work will directly impact how the world runs AI inference.

Skills and Qualifications

Minimum qualifications:

  • Bachelor's degree or equivalent experience in computer science, engineering, or similar.

  • Deep understanding of transformer architectures and their variants.

  • Strong programming skills in Python with experience in PyTorch internals.

  • Experience with LLM inference systems (vLLM, TensorRT-LLM, SGLang, TGI).

  • Ability to read and implement model architectures and inference techniques from research papers.

  • Demonstrate the ability to contribute performant and maintainable code and debug in complex ML codebases.

Preferred qualifications:

  • Deep understanding of KV-cache memory management, prefix caching, and hybrid model serving.

  • Familiarity with RL frameworks and algorithms for LLMs.

  • Experience with multimodal inference (audio/image/video/text).

  • Contributions to open-source ML or system infrastructure projects.

Bonus points if you have:

  • Implemented core features in vLLM or other inference engine projects.

  • Contributed to vLLM integrations (verl, OpenRLHF, Unsloth, LlamaFactory, etc).

  • Written widely-shared technical blogs or side projects on vLLM or LLM inference.

Logistics

  • Location: This role is based in San Francisco, California. Will consider remote in the US for exceptional candidates.

  • Compensation: Depending on background, skills, and experience, the expected annual salary range for this position is $200,000 - $400,000 USD + equity.

  • Visa sponsorship: We sponsor visas on a case-by-case basis.

  • Benefits: Inferact offers generous health, dental, and vision benefits as well as 401(k) company match.

Frequently Asked Questions

What is the salary for the Member of Technical Staff, Inference role at inferact?
The listed salary for this Member of Technical Staff, Inference position at inferact is USD 200K–400K/yr. This is an FullTime role.
Where is the Member of Technical Staff, Inference position at inferact located?
This Member of Technical Staff, Inference role at inferact is based in San Francisco. The position is listed as on-site or hybrid. Check the full job description or apply directly to confirm the work arrangement.
Is the Member of Technical Staff, Inference role at inferact full-time or part-time?
This is listed as a FullTime position. It is posted as a Member of Technical Staff, Inference role in the Research & Engineering department at inferact.
Which team or department does the Member of Technical Staff, Inference at inferact belong to?
This Member of Technical Staff, Inference position is part of the Research & Engineering department at inferact. See the full job description for more information about the team structure and responsibilities.
How do I apply for the Member of Technical Staff, Inference position at inferact?
Click the "Apply Now" button on this page. You will be redirected to inferact's official application portal hosted on ashby where you can submit your application directly.
When was the Member of Technical Staff, Inference job at inferact posted?
This Member of Technical Staff, Inference position at inferact was posted on Jan 22, 2026. Apply as soon as possible — early applications are often reviewed first.
Member of Technical Staff, Inference
inferact · 💰 USD 200K–400K/yr
Apply for this role ↗

You'll be redirected to inferact's official application page on Ashby ATS.