Senior AI Data Pipeline Engineer

42dotยท ENGINEERING
Apply Now โ†—
๐ŸŒ Remote๐Ÿ“ Pangyo (Software Dream Center), South KoreaFullTime

About this role

We are looking for the best

42dot์˜ AI ๋ฐ์ดํ„ฐ ํŒŒ์ดํ”„๋ผ์ธ ์—”์ง€๋‹ˆ์–ด๋Š” ์ „ ์„ธ๊ณ„์—์„œ ์ˆ˜์ง‘๋˜๋Š” ๋ฐ์ดํ„ฐ๋ฅผ ์ฒ˜๋ฆฌํ•˜๊ณ  ๊ด€๋ฆฌํ•˜๋Š” ๊ธ€๋กœ๋ฒŒ ๋ฐ์ดํ„ฐ ํŒŒ์ดํ”„๋ผ์ธ์„ ์„ค๊ณ„ํ•˜๊ณ  ํ™•์žฅํ•ฉ๋‹ˆ๋‹ค. ํŽ˜ํƒ€๋ฐ”์ดํŠธ(PB)๊ธ‰ ๋ฐ์ดํ„ฐ๋ฅผ ๋Œ€๊ทœ๋ชจ GPU ์ธํ”„๋ผ์— ์•ˆ์ •์ ์œผ๋กœ ์ „๋‹ฌํ•˜์—ฌ, ํ•ต์‹ฌ์ ์ธ AI ์›Œํฌ๋กœ๋“œ๋ฅผ ๊ฐ€๋™ํ•˜๋Š” ๊ณ ์ฒ˜๋ฆฌ๋Ÿ‰ ์‹œ์Šคํ…œ์„ ๊ตฌ์ถ•ํ•˜๊ณ  ์šด์˜ํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.

At 42dot, our AI Data Pipeline Engineer architect and scale global data pipelines that ingest and process data from worldwide sources. You will design and operate high-throughput systems to reliably deliver petabyte-scale data to our large-scale GPU infrastructure, powering mission-critical AI workloads.

Responsibilities

  • ๋‹ค์–‘ํ•œ AI ๋ฐ ๋จธ์‹ ๋Ÿฌ๋‹ ํ”„๋กœ์ ํŠธ๋ฅผ ์ง€์›ํ•˜๊ธฐ ์œ„ํ•œ ๊ณ ์„ฑ๋Šฅยท๊ณ ํ™•์žฅ์„ฑ ๋ฐ์ดํ„ฐ ํŒŒ์ดํ”„๋ผ์ธ ์„ค๊ณ„ ๋ฐ ๊ตฌ์ถ•

  • ๊ธ€๋กœ๋ฒŒ ๋ฐ์ดํ„ฐ ๊ฐ€์šฉ์„ฑ ๋ฐ ์›ํ™œํ•œ ๋™๊ธฐํ™”๋ฅผ ์œ„ํ•œ ๋ฉ€ํ‹ฐ ๋ฆฌ์ „(Multi-region) ๋ฐ์ดํ„ฐ ์ธํ”„๋ผ ์•„ํ‚คํ…์ฒ˜ ์„ค๊ณ„ ๋ฐ ๊ตฌํ˜„

  • ์—ฌ๋Ÿฌ AI ํ”„๋กœ์ ํŠธ๋ฅผ ๋™์‹œ ์ง€์›ํ•  ์ˆ˜ ์žˆ๋„๋ก ๋ณต์žกํ•œ ๋ธŒ๋žœ์นญ ๋ฐ ๋กœ์ง ๊ฒฉ๋ฆฌ๊ฐ€ ๊ฐ€๋Šฅํ•œ ์œ ์—ฐํ•œ ํŒŒ์ดํ”„๋ผ์ธ ์•„ํ‚คํ…์ฒ˜ ๊ฐœ๋ฐœ

  • Databricks ๋ฐ Spark๋ฅผ ํ™œ์šฉํ•œ ๋Œ€๊ทœ๋ชจ ๋ฐ์ดํ„ฐ ์ฒ˜๋ฆฌ ์›Œํฌ๋กœ๋“œ ์ตœ์ ํ™”(์ฒ˜๋ฆฌ๋Ÿ‰ ๊ทน๋Œ€ํ™” ๋ฐ ๋น„์šฉ ์ตœ์†Œํ™”)

  • Kubernetes ๊ธฐ๋ฐ˜ ์ปจํ…Œ์ด๋„ˆ ๋ฐ์ดํ„ฐ ํ™˜๊ฒฝ ์œ ์ง€ ๋ณด์ˆ˜ ๋ฐ ๊ณ ๋„ํ™”๋กœ ๋ฐ์ดํ„ฐ ์›Œํฌ๋กœ๋“œ์˜ ์•ˆ์ •์  ์‹คํ–‰ ๋ณด์žฅ

  • AI ๋ฆฌ์„œ์ฒ˜ ๋ฐ ํ”Œ๋žซํผ ํŒ€๊ณผ ํ˜‘์—…ํ•˜์—ฌ ๊ณ ํ’ˆ์งˆ ๋ฐ์ดํ„ฐ๋ฅผ ํ•™์Šต ๋ฐ ํ‰๊ฐ€ ํŒŒ์ดํ”„๋ผ์ธ์œผ๋กœ ํšจ์œจ์ ์œผ๋กœ ๊ณต๊ธ‰

  • Design and build high-performance, scalable data pipelines to support diverse AI and Machine Learning initiatives across the organization.

  • Architect and implement multi-region data infrastructure to ensure global data availability and seamless synchronization.

  • Develop flexible pipeline architectures that allow for complex branching and logic isolation to support multiple concurrent AI projects.

  • Optimize large-scale data processing workloads using Databricks and Spark to maximize throughput and minimize processing costs.

  • Maintain and evolve the containerized data environment on Kubernetes, ensuring robust and reliable execution of data workloads.

  • Collaborate with AI researchers and platform teams to streamline the flow of high-quality data into training and evaluation pipelines.

Qualifications

  • ๋Œ€๊ทœ๋ชจ AI/ML ๋ฐ์ดํ„ฐ์…‹์„ ์œ„ํ•œ ํ”„๋กœ๋•์…˜๊ธ‰ ๋ฐ์ดํ„ฐ ํŒŒ์ดํ”„๋ผ์ธ ๊ตฌ์ถ• ๋ฐ ์šด์˜ ๊ฒฝํ—˜

  • Apache Spark ๋ฐ Databricks ์ƒํƒœ๊ณ„ ๋“ฑ ๋ถ„์‚ฐ ์ฒ˜๋ฆฌ ํ”„๋ ˆ์ž„์›Œํฌ์— ๋Œ€ํ•œ ๋†’์€ ์ˆ™๋ จ๋„

  • Apache Airflow ๋“ฑ ์›Œํฌํ”Œ๋กœ์šฐ ์˜ค์ผ€์ŠคํŠธ๋ ˆ์ด์…˜ ๋„๊ตฌ๋ฅผ ํ™œ์šฉํ•œ ๋ณต์žกํ•œ ์˜์กด์„ฑ ๊ด€๋ฆฌ ๋ฐ ์‹ค๋ฌด ๊ฒฝํ—˜

  • Kubernetes ๋ฐ ์ปจํ…Œ์ด๋„ˆ ๊ธฐ์ˆ ์„ ํ™œ์šฉํ•œ ๋ฐ์ดํ„ฐ ์ฒ˜๋ฆฌ ์ปดํฌ๋„ŒํŠธ ๋ฐฐํฌ ๋ฐ ํ™•์žฅ ๋Šฅ๋ ฅ

  • Apache Kafka ๋“ฑ ๋ถ„์‚ฐ ๋ฉ”์‹œ์ง• ์‹œ์Šคํ…œ์„ ํ™œ์šฉํ•œ ๊ณ ์ฒ˜๋ฆฌ๋Ÿ‰ ๋ฐ์ดํ„ฐ ์ˆ˜์ง‘ ๋ฐ ์ด๋ฒคํŠธ ๊ธฐ๋ฐ˜ ์•„ํ‚คํ…์ฒ˜ ์ดํ•ด

  • Python์„ ํ™œ์šฉํ•œ ์‹œ์Šคํ…œ ๋ ˆ๋ฒจ ์ตœ์ ํ™” ๋ฐ ์ˆ˜์ค€ ๋†’์€ ํ”„๋กœ๊ทธ๋ž˜๋ฐ ์—ญ๋Ÿ‰

  • ๋ณด์•ˆ๊ณผ ํ™•์žฅ์„ฑ์„ ๊ณ ๋ คํ•œ ํด๋ผ์šฐ๋“œ ๋„ค์ดํ‹ฐ๋ธŒ ์„œ๋น„์Šค ๋ฐ ์ธํ”„๋ผ ๊ตฌ์ถ• best practices์— ๋Œ€ํ•œ ์ดํ•ด

  • ๋ณต์žกํ•˜๊ณ  ๊ฑฐ๋Œ€ํ•œ ์‹œ์Šคํ…œ์—์„œ ๊ทผ๋ณธ ์›์ธ์„ ์ฐพ์•„ ํ•ด๊ฒฐํ•˜๋Š” ๋…ผ๋ฆฌ์ ์ธ ๋ฌธ์ œ ํ•ด๊ฒฐ ๋Šฅ๋ ฅ

  • ๋‹ค์–‘ํ•œ ์œ ๊ด€ ๋ถ€์„œ ๋ฐ ํŒŒํŠธ๋„ˆ์™€ ์›ํ™œํ•˜๊ฒŒ ์†Œํ†ตํ•  ์ˆ˜ ์žˆ๋Š” ์ปค๋ฎค๋‹ˆ์ผ€์ด์…˜ ์—ญ๋Ÿ‰

  • Extensive professional experience in building and operating production-grade data pipelines for massive-scale AI/ML datasets.

  • Strong proficiency in distributed processing frameworks, particularly Apache Spark and the Databricks ecosystem.

  • Deep hands-on experience with workflow orchestration tools like Apache Airflow for managing complex dependency graphs.

  • Solid understanding of Kubernetes and containerization for deploying and scaling data processing components.

  • Proficiency in distributed messaging systems such as Apache Kafka for high-throughput data ingestion and event-driven architectures.

  • Expert-level programming skills in Python for system-level optimizations.

  • Strong knowledge of cloud-native services and best practices for building secure and scalable data infrastructure.

  • Logical approach to problem-solving with the persistence to identify and resolve root causes in complex, large-scale systems.

  • Strong communication skills to effectively collaborate with cross-functional teams and external partners.

Preferred Qualifications

  • ๊ธ€๋กœ๋ฒŒ ๋ฉ€ํ‹ฐ ๋ฆฌ์ „ ํŒŒ์ดํ”„๋ผ์ธ ์„ค๊ณ„ ๋ฐ ๊ตญ๊ฐ€ ๊ฐ„ ๋ฐ์ดํ„ฐ ์ „์†ก/์ง€์—ฐ ์‹œ๊ฐ„(Latency) ์ด์Šˆ ํ•ด๊ฒฐ ๊ฒฝํ—˜

  • Ray ๋“ฑ AI ์›Œํฌ๋กœ๋“œ๋ฅผ ์œ„ํ•œ ๋ถ„์‚ฐ ์ปดํ“จํŒ… ํ”„๋ ˆ์ž„์›Œํฌ ๊ตฌํ˜„ ๊ฒฝํ—˜ ๋˜๋Š” ๊นŠ์€ ๊ด€์‹ฌ

  • Spark Streaming ๋˜๋Š” Flink๋ฅผ ์ด์šฉํ•œ ์‹ค์‹œ๊ฐ„/์ค€์‹ค์‹œ๊ฐ„(Near real-time) ํŒŒ์ดํ”„๋ผ์ธ ๊ตฌ์ถ• ๊ฒฝํ—˜

  • Terraform ๋“ฑ Infrastructure as Code(IaC) ๋„๊ตฌ๋ฅผ ํ™œ์šฉํ•œ ๋ณต์žกํ•œ ๋ฐ์ดํ„ฐ ํ™˜๊ฒฝ ๊ด€๋ฆฌ ๊ฒฝํ—˜

  • ์ „์ฒด ML ์ƒ์• ์ฃผ๊ธฐ(MLOps) ๋ฐ ๋ฐ์ดํ„ฐ ์ธํ”„๋ผ๊ฐ€ ๋ชจ๋ธ ์‹คํ—˜๊ณผ ๋ฐฐํฌ๋ฅผ ์ง€์›ํ•˜๋Š” ๋ฉ”์ปค๋‹ˆ์ฆ˜์— ๋Œ€ํ•œ ์ดํ•ด

  • Experience in architecting global, multi-region data pipelines and solving challenges related to cross-border data transfer and latency.

  • Practical experience or a strong interest in implementing distributed computing frameworks like Ray for AI workloads.

  • Experience in building real-time or near-real-time pipelines using Spark Streaming or Flink.

  • Familiarity with Infrastructure as Code (IaC) tools such as Terraform to manage complex data environments.

  • Understanding of the end-to-end ML lifecycle (MLOps) and how data infrastructure supports model experimentation and deployment.

Interview Process

  • ์„œ๋ฅ˜์ „ํ˜• - ์ฝ”๋”ฉํ…Œ์ŠคํŠธ - ํ™”์ƒ๋ฉด์ ‘ (1์‹œ๊ฐ„ ๋‚ด์™ธ) - ๋Œ€๋ฉด ํ˜น์€ ํ™”์ƒ๋ฉด์ ‘ (3์‹œ๊ฐ„ ๋‚ด์™ธ) - ์ตœ์ข…ํ•ฉ๊ฒฉ

  • ์ „ํ˜•์ ˆ์ฐจ๋Š” ์ง๋ฌด๋ณ„๋กœ ๋‹ค๋ฅด๊ฒŒ ์šด์˜๋  ์ˆ˜ ์žˆ์œผ๋ฉฐ, ์ผ์ • ๋ฐ ์ƒํ™ฉ์— ๋”ฐ๋ผ ๋ณ€๋™๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

  • ์ „ํ˜•์ผ์ • ๋ฐ ๊ฒฐ๊ณผ๋Š” ์ง€์›์„œ์— ๋“ฑ๋กํ•˜์‹  ์ด๋ฉ”์ผ๋กœ ๊ฐœ๋ณ„ ์•ˆ๋‚ด๋“œ๋ฆฝ๋‹ˆ๋‹ค.

  • Resume Screening - Coding Test - Virtual Interview (approximately 1 hour) - Onsite or Virtual Interview (approximately 3 hours) - Final Offer

  • Please note that the interview process may vary depending on the position and is subject to change based on scheduling and other circumstances.

  • Interview schedules and results will be communicated individually via the email address provided in your application.

Additional Information

  • ๋ชจ๋“  ์ œ์ถœํŒŒ์ผ์€ PDF ์–‘์‹์œผ๋กœ ์—…๋กœ๋“œ๋ฅผ ๋ถ€ํƒ๋“œ๋ฆฝ๋‹ˆ๋‹ค.

  • ๊ตญ๊ฐ€๋ณดํ›ˆ๋Œ€์ƒ์ž ๋ฐ ์ทจ์—…๋ณดํ˜ธ๋Œ€์ƒ์ž๋Š” ๊ด€๊ณ„๋ฒ•๋ น์— ๋”ฐ๋ผ ์šฐ๋Œ€ํ•ฉ๋‹ˆ๋‹ค.

  • ์žฅ์• ์ธ ๊ณ ์šฉ์ด‰์ง„ ๋ฐ ์ง์—…์žฌํ™œ๋ฒ•์— ๋”ฐ๋ผ ์žฅ์• ์ธ ๋“ฑ๋ก์ฆ ์†Œ์ง€์ž๋ฅผ ์šฐ๋Œ€ํ•ฉ๋‹ˆ๋‹ค.

  • 42dot์€ ์˜๋ขฐํ•˜์ง€ ์•Š์€ ์„œ์น˜ํŽŒ์˜ ์ด๋ ฅ์„œ๋ฅผ ๋ฐ›์ง€ ์•Š์œผ๋ฉฐ, ์š”์ฒญํ•˜์ง€ ์•Š์€ ์ด๋ ฅ์„œ์— ๋Œ€ํ•ด ์ˆ˜์ˆ˜๋ฃŒ๋ฅผ ์ง€๋ถˆํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค.

  • 3๊ฐœ์›”์˜ ์ˆ˜์Šต๊ธฐ๊ฐ„์ด ์ ์šฉ๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

  • Please upload all required documents in PDF format.

  • Veterans and applicants eligible for employment protection will receive preferential consideration in accordance with applicable laws and regulations.

  • In compliance with the Act on Employment Promotion and Vocational Rehabilitation for Persons with Disabilities, registered individuals with disabilities will receive preferential consideration.

  • 42dot does not accept unsolicited resumes from search firms. We will not pay any fees for resumes submitted without prior agreement.

  • A 3-month probationary period may apply.

โ€ป ์ง€์› ์ „ ์•„๋ž˜ ๋‚ด์šฉ์„ ๊ผญ ํ™•์ธํ•ด ์ฃผ์„ธ์š”.

โ€ป Please make sure to review the information below before applying.

Frequently Asked Questions

Is the salary disclosed for the Senior AI Data Pipeline Engineer position at 42dot?
The salary for this Senior AI Data Pipeline Engineer role at 42dot is not publicly listed. Click "Apply Now" to learn more about the compensation package on their official careers page.
Is the Senior AI Data Pipeline Engineer job at 42dot remote?
Yes, this Senior AI Data Pipeline Engineer position at 42dot is remote, with team members based in Pangyo (Software Dream Center), South Korea. You can work from home or anywhere in the supported regions.
Is the Senior AI Data Pipeline Engineer role at 42dot full-time or part-time?
This is listed as a FullTime position. It is posted as a Senior AI Data Pipeline Engineer role in the ENGINEERING department at 42dot.
Which team or department does the Senior AI Data Pipeline Engineer at 42dot belong to?
This Senior AI Data Pipeline Engineer position is part of the ENGINEERING department at 42dot. See the full job description for more information about the team structure and responsibilities.
How do I apply for the Senior AI Data Pipeline Engineer position at 42dot?
Click the "Apply Now" button on this page. You will be redirected to 42dot's official application portal hosted on ashby where you can submit your application directly.
When was the Senior AI Data Pipeline Engineer job at 42dot posted?
This Senior AI Data Pipeline Engineer position at 42dot was posted on Feb 9, 2026. Apply as soon as possible โ€” early applications are often reviewed first.
Senior AI Data Pipeline Engineer
42dot
Apply for this role โ†—

You'll be redirected to 42dot's official application page on Ashby ATS.