Member of Technical Staff, Evals

magic.dev· Engineering

📍 San FranciscoFullTime💰 USD 200K–550K/yr🗓 Posted Jun 1, 2026

About this role

Magic’s mission is to build safe AGI that accelerates humanity’s progress on the world’s most important problems. We believe the most promising path to safe AGI lies in automating research and code generation to improve models and solve alignment more reliably than humans can alone. Our approach combines frontier-scale pre-training, domain-specific RL, ultra-long context, and inference-time compute to achieve this goal.

About the role

Evals builds the internal platform that teams across Magic use to evaluate the performance of internal and external models. The team supports pre-training, post-training, data, inference, and product, and sits on the critical path of many of the company's most important decisions.

As a Member of Technical Staff on Evals, you will build both the platform and the evaluations themselves. You'll develop infrastructure for large-scale evaluations, data ablations, and dataset quality analysis, while designing and validating the methodologies used to measure model performance.

Sweating the details matters on this team. Many benchmarks, papers, and open-source evaluation frameworks contain subtle bugs or flawed assumptions that lead to misleading conclusions. We care deeply about correctness, reproducibility, and measurement quality.

Evals are essential to the success of the company. By building trustworthy evaluation systems, you will help Magic make better research decisions, build better datasets, and ship better products.

What you'll work on

Build and maintain the internal evals platform used across Magic
Design, implement, and validate eval tasks for pre-training, post-training, reinforcement learning, inference, and product systems
Develop infrastructure for running large-scale evaluations
Build systems to measure dataset quality and identify opportunities to improve training data
Improve evaluation correctness, reproducibility, and reliability
Audit and improve upon public benchmarks, evaluation methodologies, and open-source implementations
Partner with research, data, inference, and product teams to define metrics that accurately reflect model quality
Build tooling and frameworks that enable teams across Magic to make decisions based on trustworthy measurements

What we're looking for

Experience building production systems, internal platforms, or developer infrastructure
Experience working with machine learning systems, evaluation frameworks, data infrastructure, or research tooling
Track record of owning technical projects end-to-end
Skepticism toward results that cannot be reproduced, validated, or explained
Ability to reason critically about benchmarks, metrics, and experimental methodology
Experience designing, implementing, or operating systems that run at scale
Comfortable navigating ambiguity and determining whether a measurement is actually capturing the behavior it claims to measure
Excitement about helping researchers and engineers make better decisions through trustworthy measurements

Compensation, benefits, and perks (US)

Annual salary range between $200K - $550K depending on experience
Equity is a significant part of total compensation, in addition to salary
401(k) plan with 6% salary matching
Generous health, dental, and vision insurance for you and your dependents
Unlimited paid time off
Visa sponsorship and relocation support for candidates moving to San Francisco
A small, fast-moving, highly collaborative team working on frontier AI systems

Magic strives to be the place where high-potential individuals can do their best work. We value quick learning and grit just as much as skill and experience.

Our culture

Integrity. Words and actions should be aligned
Hands-on. At Magic, everyone is building
Teamwork. We move as one team, not N individuals
Focus. Safely deploy AGI. Everything else is noise
Quality. Magic should feel like magic

Frequently Asked Questions

What is the salary for the Member of Technical Staff, Evals role at magic.dev?

The listed salary for this Member of Technical Staff, Evals position at magic.dev is USD 200K–550K/yr. This is an FullTime role.

Where is the Member of Technical Staff, Evals position at magic.dev located?

This Member of Technical Staff, Evals role at magic.dev is based in San Francisco. The position is listed as on-site or hybrid. Check the full job description or apply directly to confirm the work arrangement.

Is the Member of Technical Staff, Evals role at magic.dev full-time or part-time?

This is listed as a FullTime position. It is posted as a Member of Technical Staff, Evals role in the Engineering department at magic.dev.

Which team or department does the Member of Technical Staff, Evals at magic.dev belong to?

This Member of Technical Staff, Evals position is part of the Engineering department at magic.dev. See the full job description for more information about the team structure and responsibilities.

How do I apply for the Member of Technical Staff, Evals position at magic.dev?

Click the "Apply Now" button on this page. You will be redirected to magic.dev's official application portal hosted on ashby where you can submit your application directly.

When was the Member of Technical Staff, Evals job at magic.dev posted?

This Member of Technical Staff, Evals position at magic.dev was posted on Jun 1, 2026. Apply as soon as possible — early applications are often reviewed first.

Member of Technical Staff, Evals

magic.dev · 💰 USD 200K–550K/yr

Apply for this role ↗

You'll be redirected to magic.dev's official application page on Ashby ATS.