Software Engineer, Distributed Data Systems

exa· Engineering
Apply Now ↗
📍 San Francisco, CaliforniaFullTime💰 USD 180K–350K/yr

About this role

Exa is an applied AI lab building a search engine unlike the world has ever seen. We build massive-scale infra to crawl the entire web, train state-of-the-art embedding models to process it, and design super high performant vector databases to retrieve over it. We now power search for Cursor, Cognition, HubSpot, and over 400,000 developers and have raised $350m from Lightspeed, Benchmark, and a16z.

 

Our ultimate goal is to build perfect search over all the world's information, far beyond Google. If you want to build massive-scale ML systems that will define the way the new AI world consumes information, this is the place for you.

 

As a Data Engineer, you'll architect and build the data infrastructure that powers everything we do—from crawling billions of pages to training our embedding models to serving real-time search. You'll have enormous autonomy in designing systems that scale to hundreds of petabytes. If you've ever wanted to build data pipelines at a scale that most companies only dream about, this is your chance.

 

Who You Are

  • Deep understanding of lakehouse architectures (Delta Lake, Iceberg, Hudi) and when to use them

  • Experience building and operating large-scale distributed data processing pipelines

  • Hands-on experience with streaming data systems (Kafka, Flink, or similar)

  • Familiarity with Ray, Spark, or ClickHouse at production scale

  • An obsessive focus on reliability and building systems that don't page you at 3am

Bonus

  • Experience with Lance or other vector-native storage formats

  • Background in GPU-accelerated data processing (RAPIDS, cuDF)

 

What You Could Do

  • Design a lakehouse architecture that handles 100+ PB of web crawl data

  • Build streaming pipelines that process billions of documents per day for real-time indexing

  • Architect the data layer for our embedding training infrastructure on Ray

  • Scale our ClickHouse deployment to handle analytical queries across petabytes of search logs

Logistics

  • Location: This is an in-person opportunity in San Francisco.

  • Visas: We're happy to sponsor international candidates (e.g., STEM OPT, OPT, H1B, O1, E3). While we cannot guarantee your visa, we have historically been successful in sponsoring candidates from all over the world. If you receive an offer, our team will work hard to get you a visa.

  • Benefits: We offer premium healthcare benefits (medical, dental, vision), fertility benefits, 16 weeks of fully paid parental leave for all new parents, and a monthly wellness stipend to all of our employees.

Frequently Asked Questions

What is the salary for the Software Engineer, Distributed Data Systems role at exa?
The listed salary for this Software Engineer, Distributed Data Systems position at exa is USD 180K–350K/yr. This is an FullTime role.
Where is the Software Engineer, Distributed Data Systems position at exa located?
This Software Engineer, Distributed Data Systems role at exa is based in San Francisco, California. The position is listed as on-site or hybrid. Check the full job description or apply directly to confirm the work arrangement.
Is the Software Engineer, Distributed Data Systems role at exa full-time or part-time?
This is listed as a FullTime position. It is posted as a Software Engineer, Distributed Data Systems role in the Engineering department at exa.
Which team or department does the Software Engineer, Distributed Data Systems at exa belong to?
This Software Engineer, Distributed Data Systems position is part of the Engineering department at exa. See the full job description for more information about the team structure and responsibilities.
How do I apply for the Software Engineer, Distributed Data Systems position at exa?
Click the "Apply Now" button on this page. You will be redirected to exa's official application portal hosted on ashby where you can submit your application directly.
When was the Software Engineer, Distributed Data Systems job at exa posted?
This Software Engineer, Distributed Data Systems position at exa was posted on Dec 19, 2025. Apply as soon as possible — early applications are often reviewed first.
Software Engineer, Distributed Data Systems
exa · 💰 USD 180K–350K/yr
Apply for this role ↗

You'll be redirected to exa's official application page on Ashby ATS.