Research Engineer, Model Inference & Serving - Paris

hcompanyยท RESEARCH
Apply Now โ†—
๐ŸŒ Remote๐Ÿ“ Hybrid ParisFullTime

About this role

Research Engineer, Model Inference & Serving

About H: H exists to push the boundaries of superintelligence with agentic AI. By automating complex, multi-step tasks typically performed by humans, AI agents will help unlock full human potential. H is hiring the world's best AI talent, seeking those who are dedicated as much to building safely and responsibly as to advancing disruptive agentic capabilities. We promote a mindset of openness, learning, and collaboration, where everyone has something to contribute.

About the Team: The Inference team builds and operates the systems that serve H's foundational models in production. We focus on multimodal inference and serving for Computer Use Agents, optimizing across both the inference engine layer (e.g., vLLM, SGLang) and the model serving layer (e.g., disaggregated inference, intelligent routing). Agentic inference brings constraints around context length, multimodality, and tool calls, which we address by co-designing with the Models team on training-time choices and with the agent teams on how models are deployed. We operate at the intersection of research and production, translating cutting-edge inference techniques into the systems that power H's next generation of agents. We are looking for strong engineers excited about inference to join the team and help shape the systems behind superintelligent AI.

ย 

Key Responsibilities:

  • Build and operate the inference stack that serves H's multimodal agentic models

  • Improve latency, throughput, and cost of model serving across the stack

  • Research and implement inference techniques tailored to agent workloads

  • Co-design with the Models team on training-time decisions that affect inference

  • Collaborate with cross-functional teams to integrate inference into agentic AI products

  • Evaluate inference, serving, and hardware platforms, and communicate findings to stakeholders

  • Stay current with advancements in inference, model serving, and accelerator technology

Requirements:

  • Technical skills:

    • Strong software engineering track record

    • Proficient in Python and at least one systems language (Rust, C++, or Go)

    • Hands-on experience with deep learning frameworks (PyTorch, JAX), preferably in an industry setting

    • Solid distributed systems fundamentals

    • Experience working in a modern cloud environment and with production ML infrastructure (Kubernetes, etc.)

    • Working knowledge of modern ML, including transformers and multimodal architectures

  • Research skills:

    • Research engagement: an advanced degree with research output, or publications at top-tier AI or systems venues (e.g., NeurIPS, ICML, MLSys, OSDI), research internships, or substantive open-source contributions

  • Soft skills:

    • Excellent communication and presentation skills

    • Strong collaboration and teamwork skills

    • Passion for inference and AI

  • Preferred qualifications:

    • Startup experience

    • Hands-on experience with inference frameworks (vLLM, SGLang, TensorRT-LLM)

    • Writing or modifying GPU kernels (CUDA, Triton, etc.)

    • Edge or on-device inference experience (llama.cpp, MLX, ONNX Runtime, etc.)

    • Experience with quantization, speculative decoding, disaggregated inference or KV-cache compression

    • Experience with multimodal models and/or agentic systems

Location:

  • Paris or London.

  • This role is hybrid, and you are expected to be in the office 3 days a week on average.

  • Please expect some travel between offices on a reasonable cadence (e.g., every 4-6 weeks).

What We Offer:

  • Join the exciting journey of shaping the future of AI

  • Collaborate with a fun, dynamic and multicultural team, working alongside world-class AI talent in a highly collaborative environment

  • Enjoy a competitive salary

  • Unlock opportunities for professional growth, continuous learning, and career development

If you want to change the status quo in AI, join us.

Frequently Asked Questions

Is the salary disclosed for the Research Engineer, Model Inference & Serving - Paris position at hcompany?
The salary for this Research Engineer, Model Inference & Serving - Paris role at hcompany is not publicly listed. Click "Apply Now" to learn more about the compensation package on their official careers page.
Is the Research Engineer, Model Inference & Serving - Paris job at hcompany remote?
Yes, this Research Engineer, Model Inference & Serving - Paris position at hcompany is remote, with team members based in Hybrid Paris. You can work from home or anywhere in the supported regions.
Is the Research Engineer, Model Inference & Serving - Paris role at hcompany full-time or part-time?
This is listed as a FullTime position. It is posted as a Research Engineer, Model Inference & Serving - Paris role in the RESEARCH department at hcompany.
Which team or department does the Research Engineer, Model Inference & Serving - Paris at hcompany belong to?
This Research Engineer, Model Inference & Serving - Paris position is part of the RESEARCH department at hcompany. See the full job description for more information about the team structure and responsibilities.
How do I apply for the Research Engineer, Model Inference & Serving - Paris position at hcompany?
Click the "Apply Now" button on this page. You will be redirected to hcompany's official application portal hosted on ashby where you can submit your application directly.
When was the Research Engineer, Model Inference & Serving - Paris job at hcompany posted?
This Research Engineer, Model Inference & Serving - Paris position at hcompany was posted on Apr 14, 2026. Apply as soon as possible โ€” early applications are often reviewed first.
Research Engineer, Model Inference & Serving - Paris
hcompany
Apply for this role โ†—

You'll be redirected to hcompany's official application page on Ashby ATS.