Research Data Engineer

smallest· Core AI Research
Apply Now ↗
📍 BengaluruFullTime

About this role

Research Data Engineer (India) — Smallest.ai

About the Role

This is not a typical data engineering role. You won’t be building dashboards. You won’t be maintaining pipelines no one touches.

You will take messy, noisy, real-world data — and turn it into something models can learn from. Think of this as running a gold mine - you take dust and convert it to gold.

We work on speech, language, and real-time systems across 50+ languages.
The difference between a good model and a great one is almost always data quality + data systems. That’s where you come in.

What You’ll Work On

  • Data Pipelines (Real-time + Batch)

    • Build high-throughput pipelines for audio, text, and multimodal data

    • Streaming + offline processing at scale

  • Data Quality & Curation

    • Cleaning, filtering, deduplication, normalization (numbers, emails, code-mix, etc.)

    • Designing heuristics + ML-based data filtering systems

  • Multilingual Data Systems

    • Handling 50+ languages, accents, and code-mixed inputs

    • Language-aware normalization and segmentation

  • Training Data Engine

    • Build pipelines that continuously generate better training data from production

    • Active learning loops, data selection, sampling strategies

  • Evaluation & Benchmarking Pipelines

    • Create scalable eval datasets across languages and domains

    • Automate quality tracking for ASR, TTS, and conversational systems

  • Data Infra for Research

    • Work closely with research team to unblock experiments fast

    • Build systems that reduce iteration time from weeks → hours

What This Role Is NOT

  • Not a dashboard/reporting role

  • Not a “move data from A to B” role

  • Not a maintenance-heavy legacy pipeline role

What We’re Looking For

  • Strong fundamentals in data structures, systems, and pipelines

  • Experience with large-scale data processing (audio/text preferred)

  • Comfortable with messy, unstructured, real-world data

  • Strong coding skills (Python required; systems experience is a plus)

  • Understanding of ML/data pipelines (training, eval, data curation)

Bonus (Not Mandatory)

  • Experience with speech/audio data (ASR/TTS)

  • Familiarity with multilingual datasets

  • Experience with streaming systems (Kafka, etc.)

  • Exposure to data-centric AI / data quality frameworks

How We Work

  • Speed over perfection

  • Production over papers

  • Systems that scale, not scripts that barely work

  • Tight loop between data → model → eval → improvement

Who This Is For

  • You enjoy working with raw, chaotic data

  • You care about data quality more than tooling hype

  • You like building systems that directly impact model performance

  • You get excited by turning unusable data into competitive advantage

Why Join Us

We’re building real-time, multilingual voice AI systems.

At this level, models are only as good as the data behind them.

If you want to work on the layer that actually moves the needle - this is it.

Frequently Asked Questions

Is the salary disclosed for the Research Data Engineer position at smallest?
The salary for this Research Data Engineer role at smallest is not publicly listed. Click "Apply Now" to learn more about the compensation package on their official careers page.
Where is the Research Data Engineer position at smallest located?
This Research Data Engineer role at smallest is based in Bengaluru. The position is listed as on-site or hybrid. Check the full job description or apply directly to confirm the work arrangement.
Is the Research Data Engineer role at smallest full-time or part-time?
This is listed as a FullTime position. It is posted as a Research Data Engineer role in the Core AI Research department at smallest.
Which team or department does the Research Data Engineer at smallest belong to?
This Research Data Engineer position is part of the Core AI Research department at smallest. See the full job description for more information about the team structure and responsibilities.
How do I apply for the Research Data Engineer position at smallest?
Click the "Apply Now" button on this page. You will be redirected to smallest's official application portal hosted on ashby where you can submit your application directly.
When was the Research Data Engineer job at smallest posted?
This Research Data Engineer position at smallest was posted on Apr 7, 2026. Apply as soon as possible — early applications are often reviewed first.
Research Data Engineer
smallest
Apply for this role ↗

You'll be redirected to smallest's official application page on Ashby ATS.