Synthetic Data Engineer (AI Data/Training)

hyphenconnectยท Engineering
Apply Now โ†—
๐Ÿ“ San Francisco Bay Area, USA

About this role

We are seeking a talented and innovative Synthetic Data Engineer. In this role, you will design and implement domain-specific synthetic data generation pipelines, ensuring high-quality data management for training loops. Your expertise will drive the success of data processing and model training within the organization.

ย 

Responsibilities:

  • Design domain-specific synthetic data generation (SDG) pipelines via self-instruct and constitutional prompting.
  • Implement automated quality scoring and de-duplication systems.
  • Manage data pipelines that feed directly into SFT and DPO training loops.

Qualifications:

  • Proven experience building large-scale data pipelines (Airflow, Spark, Ray).
  • Deep knowledge of prompt engineering for data generation.
  • Familiarity with dataset distillation and bias mitigation.

Frequently Asked Questions

Is the salary disclosed for the Synthetic Data Engineer (AI Data/Training) position at hyphenconnect?
The salary for this Synthetic Data Engineer (AI Data/Training) role at hyphenconnect is not publicly listed. Click "Apply Now" to learn more about the compensation package on their official careers page.
Where is the Synthetic Data Engineer (AI Data/Training) position at hyphenconnect located?
This Synthetic Data Engineer (AI Data/Training) role at hyphenconnect is based in San Francisco Bay Area, USA. The position is listed as on-site or hybrid. Check the full job description or apply directly to confirm the work arrangement.
Which team or department does the Synthetic Data Engineer (AI Data/Training) at hyphenconnect belong to?
This Synthetic Data Engineer (AI Data/Training) position is part of the Engineering department at hyphenconnect. See the full job description for more information about the team structure and responsibilities.
How do I apply for the Synthetic Data Engineer (AI Data/Training) position at hyphenconnect?
Click the "Apply Now" button on this page. You will be redirected to hyphenconnect's official application portal hosted on greenhouse where you can submit your application directly.
When was the Synthetic Data Engineer (AI Data/Training) job at hyphenconnect posted?
This Synthetic Data Engineer (AI Data/Training) position at hyphenconnect was posted on Apr 24, 2026. Apply as soon as possible โ€” early applications are often reviewed first.
Synthetic Data Engineer (AI Data/Training)
hyphenconnect
Apply for this role โ†—

You'll be redirected to hyphenconnect's official application page on Greenhouse.