Inference Performance & Deployment - Member of Technical Staff

callosum· Intelligent Systems Engineering

📍 LondonFullTime💰 GBP 101K–192K/yr🗓 Posted May 20, 2026

About this role

About Us

The last era of AI scaled on a single bet: bigger models, more identical chips, more data. As problems grow more complex and the requirements of intelligence more diverse, that bet is breaking down. Real-world problems are heterogeneous: no single model or chip can solve them alone. The next era of AI requires heterogeneity at the infrastructure level - diverse models on diverse chips, each with distinct strengths, co-evolving into systems of capability that move the Pareto frontier of what is possible. That's what we are building.

Callosum is the Intelligent Systems Company. We started from questioning what actually creates intelligence. We believe there is no single answer, but rather a system-level solution. We co-evolve models, workflows, and silicon together to show that intelligence does not come from a single component, but it emerges from the diversity of co-optimised mechanisms working together and aware of each other. Heterogeneity will define the next era of compute, and is a principle that holds in biological, neuronal, and economic systems alike.

In early 2026 we launched with results showing orders of magnitude improvements in performance, and this is only the beginning. Agentic AI is the future of how intelligence is deployed: multi-step, long-horizon, and operating in changing environments. These systems are inherently heterogeneous, and can only be as powerful as the infrastructure that runs them.

We are engineers and scientists based in London, working together across the full depth of the stack. We are curious, intellectually honest, and building what doesn't exist yet. If you thrive on uncharted territory and are energised by the scale of the challenge, we'd love to hear from you.

About the Role

Callosum believes that orders of magnitude improvements in AI systems will come through application-aware orchestration across heterogeneous hardware. We are building that vision: infrastructure that treats the full landscape of compute as a unified, co-evolving system, evolved beyond GPUs.

This role owns the bridge between Callosum's internal engineering and the real world. You design the tooling and methodologies that ground our technology in real-world performance and behaviour, sitting at the integration point of every engineering function. You will be the first to run our heterogeneous infrastructure in production-equivalent conditions, systematically characterising performance, identifying bottlenecks, and driving decisions on production-readiness. Your work ensures that every layer of the stack is guided by empirical evidence rather than assumption.

What You’ll Build

Run experiments self-hosting models on cloud instances or on-prem across providers and hardware configurations, systematically characterising performance envelopes
Develop and maintain deployment patterns that are reproducible, measurable, and optimised for latency, throughput, and cost
Work at the orchestration and routing software that sits above the inference engine - to improve caching, request scheduling, batching, and resource allocation
Act as the integration point for the other roles: consume new accelerator support, engine features, and infrastructure upgrades – to provide high-quality feedback on bottlenecks, essential capabilities, and guide the stack optimisations
Build and maintain benchmarking harnesses, regression suites, and performance dashboards that give the team a shared view of system health and progress

What Sets You Apart

Experience deploying and benchmarking large model inference in production or production-equivalent environments
Familiarity with multi-node GPU deployments and associated networking/communication stacks
Strong end-to-end performance characterisation skills: able to isolate whether a bottleneck is in the network, the runtime, the memory subsystem, or the model itself
Familiarity with serving frameworks like Dynamo, Triton Inference Server, or similar orchestration layers
Clear communication skills - able to translate performance data into actionable, prioritised feedback for the teams building the underlying systems
A demonstrable disciplined and systematic approach to deployment: reproducibility, measurement methodology, controlled comparisons, etc

What We Offer

Competitive Salary, determined by skills and experience
Equity & Ownership
Private healthcare
We offer Visa sponsorship and relocation benefits to hire the best in the world
We work in person at our London office. You'll have the tools, space and setup to do your best work, and if you have specific needs, just tell us

We're committed to building an inclusive workplace where everyone feels welcome, and believe in equal opportunities for all.

Frequently Asked Questions

What is the salary for the Inference Performance & Deployment - Member of Technical Staff role at callosum?

The listed salary for this Inference Performance & Deployment - Member of Technical Staff position at callosum is GBP 101K–192K/yr. This is an FullTime role.

Where is the Inference Performance & Deployment - Member of Technical Staff position at callosum located?

This Inference Performance & Deployment - Member of Technical Staff role at callosum is based in London. The position is listed as on-site or hybrid. Check the full job description or apply directly to confirm the work arrangement.

Is the Inference Performance & Deployment - Member of Technical Staff role at callosum full-time or part-time?

This is listed as a FullTime position. It is posted as a Inference Performance & Deployment - Member of Technical Staff role in the Intelligent Systems Engineering department at callosum.

Which team or department does the Inference Performance & Deployment - Member of Technical Staff at callosum belong to?

This Inference Performance & Deployment - Member of Technical Staff position is part of the Intelligent Systems Engineering department at callosum. See the full job description for more information about the team structure and responsibilities.

How do I apply for the Inference Performance & Deployment - Member of Technical Staff position at callosum?

Click the "Apply Now" button on this page. You will be redirected to callosum's official application portal hosted on ashby where you can submit your application directly.

When was the Inference Performance & Deployment - Member of Technical Staff job at callosum posted?

This Inference Performance & Deployment - Member of Technical Staff position at callosum was posted on May 20, 2026. Apply as soon as possible — early applications are often reviewed first.

Inference Performance & Deployment - Member of Technical Staff

callosum · 💰 GBP 101K–192K/yr

Apply for this role ↗

You'll be redirected to callosum's official application page on Ashby ATS.