ML Research Engineer, TTS

cantina· Engineering
Apply Now ↗
📍 EuropeFullTime

About this role

About Cantina

Cantina is a new social platform founded by Sean Parker with the most advanced AI character creator. Our bots are lifelike, social creatures that can interact wherever people are online—across voice, video, and text. Create yourself, imagine someone new, or choose from thousands of characters to share infinitely scalable, personalized content and seamless group chat.

If you’re excited about how AI can shape creativity and social interaction, come help us build what’s next.

About the Role:

We’re looking for a Research / ML Engineer to join our Speech Team to build state-of-the-art speech systems end-to-end—from data specs through production inference. You’ll drive the model ↔ data ↔ eval flywheel for TTS and adjacent tasks (voice cloning, controllable TTS, voice conversion and more), partnering closely with research, data, and infra to ship fast, reliable, and cost-aware models. In this role, you will work at the intersection of cutting-edge research and practical engineering, contributing to the development of safe, steerable, and trustworthy AI systems.

What You’ll Do:

  • Model Building: Architect, implement, pre-train, fine-tune, and post-train/alignment (e.g., GRPO/DPO) for large-scale speech models.

  • Project Leadership: Independently lead small research projects while collaborating on larger team initiatives.

  • Experimental Design: Design, run, and analyze scientific experiments to advance our understanding of the models.

  • Tool Development: Develop and improve dev tooling to enhance team productivity.

  • Full-Stack Contribution: Contribute to the entire stack, from low-level optimizations to high-level model design.

  • Data Ownership: Define data requirements and collaborate on acquisition, curation, augmentation, labeling quality, and synthetic data strategies.

  • Rigorous Evaluation: Design automated objective/subjective evaluations—listening tests, SV/WER/ASR-based metrics, robustness & bias checks, and red-team studies.

  • Pipeline Delivery: Harden the training → evaluation → inference pipeline; profile latency, memory, and cost; and meet production SLAs with robust monitoring and rollback.

  • GPU Scaling: Partner with infrastructure to run distributed training/inference on cloud fleets and productionize models with reliability and observability.

  • Safety & Responsibility: Contribute to safety/consent guardrails and to misuse/abuse mitigation for responsible speech technology.

What You’ll Bring:

  • Exceptional research/development experience with large scale audio models (>3B models and >500k hours data).

  • Exceptional understanding and hands-on experience with transformer architectures and/or diffusion models (inc. distillation and streaming) and/or audio language modelling.

  • Strong experience with multi-node and multi-gpu distributed model training.

  • Strong software engineering skills with a proven track record of building complex systems

  • Strong with PyTorch and performance work (profiling, CUDA/Triton/C++ as needed) and writing reliable production quality code.

  • Shipped large scale speech/audio models to production.

  • Background in working with large-scale ML data.

  • Ability to iterate on data,, and triangulate quality using subjective and objective signals.

  • Notable publications and/or open source contributions in speech/audio/ML.

  • Experience with voice-cloning, speech-control, voice-generation.

Preferred Experience:

  • Shipped large scale speech/audio models (TTS/VC/ASR) to production.

  • Work on large-scale ML systems.

  • Experience with audio language modelling, transformer architectures.

  • Experience with voice-cloning, speech-control, voice-generation.

  • Background in processing large-scale ML data.

  • Publications or notable open-source in speech/audio/ML.

Frequently Asked Questions

Is the salary disclosed for the ML Research Engineer, TTS position at cantina?
The salary for this ML Research Engineer, TTS role at cantina is not publicly listed. Click "Apply Now" to learn more about the compensation package on their official careers page.
Where is the ML Research Engineer, TTS position at cantina located?
This ML Research Engineer, TTS role at cantina is based in Europe. The position is listed as on-site or hybrid. Check the full job description or apply directly to confirm the work arrangement.
Is the ML Research Engineer, TTS role at cantina full-time or part-time?
This is listed as a FullTime position. It is posted as a ML Research Engineer, TTS role in the Engineering department at cantina.
Which team or department does the ML Research Engineer, TTS at cantina belong to?
This ML Research Engineer, TTS position is part of the Engineering department at cantina. See the full job description for more information about the team structure and responsibilities.
How do I apply for the ML Research Engineer, TTS position at cantina?
Click the "Apply Now" button on this page. You will be redirected to cantina's official application portal hosted on ashby where you can submit your application directly.
When was the ML Research Engineer, TTS job at cantina posted?
This ML Research Engineer, TTS position at cantina was posted on Apr 29, 2026. Apply as soon as possible — early applications are often reviewed first.
ML Research Engineer, TTS
cantina
Apply for this role ↗

You'll be redirected to cantina's official application page on Ashby ATS.