Senior AI Data Pipeline Engineer
About this role
We are looking for the best
42dot์ AI ๋ฐ์ดํฐ ํ์ดํ๋ผ์ธ ์์ง๋์ด๋ ์ ์ธ๊ณ์์ ์์ง๋๋ ๋ฐ์ดํฐ๋ฅผ ์ฒ๋ฆฌํ๊ณ ๊ด๋ฆฌํ๋ ๊ธ๋ก๋ฒ ๋ฐ์ดํฐ ํ์ดํ๋ผ์ธ์ ์ค๊ณํ๊ณ ํ์ฅํฉ๋๋ค. ํํ๋ฐ์ดํธ(PB)๊ธ ๋ฐ์ดํฐ๋ฅผ ๋๊ท๋ชจ GPU ์ธํ๋ผ์ ์์ ์ ์ผ๋ก ์ ๋ฌํ์ฌ, ํต์ฌ์ ์ธ AI ์ํฌ๋ก๋๋ฅผ ๊ฐ๋ํ๋ ๊ณ ์ฒ๋ฆฌ๋ ์์คํ ์ ๊ตฌ์ถํ๊ณ ์ด์ํ๊ฒ ๋ฉ๋๋ค.
At 42dot, our AI Data Pipeline Engineer architect and scale global data pipelines that ingest and process data from worldwide sources. You will design and operate high-throughput systems to reliably deliver petabyte-scale data to our large-scale GPU infrastructure, powering mission-critical AI workloads.
Responsibilities
๋ค์ํ AI ๋ฐ ๋จธ์ ๋ฌ๋ ํ๋ก์ ํธ๋ฅผ ์ง์ํ๊ธฐ ์ํ ๊ณ ์ฑ๋ฅยท๊ณ ํ์ฅ์ฑ ๋ฐ์ดํฐ ํ์ดํ๋ผ์ธ ์ค๊ณ ๋ฐ ๊ตฌ์ถ
๊ธ๋ก๋ฒ ๋ฐ์ดํฐ ๊ฐ์ฉ์ฑ ๋ฐ ์ํํ ๋๊ธฐํ๋ฅผ ์ํ ๋ฉํฐ ๋ฆฌ์ (Multi-region) ๋ฐ์ดํฐ ์ธํ๋ผ ์ํคํ ์ฒ ์ค๊ณ ๋ฐ ๊ตฌํ
์ฌ๋ฌ AI ํ๋ก์ ํธ๋ฅผ ๋์ ์ง์ํ ์ ์๋๋ก ๋ณต์กํ ๋ธ๋์นญ ๋ฐ ๋ก์ง ๊ฒฉ๋ฆฌ๊ฐ ๊ฐ๋ฅํ ์ ์ฐํ ํ์ดํ๋ผ์ธ ์ํคํ ์ฒ ๊ฐ๋ฐ
Databricks ๋ฐ Spark๋ฅผ ํ์ฉํ ๋๊ท๋ชจ ๋ฐ์ดํฐ ์ฒ๋ฆฌ ์ํฌ๋ก๋ ์ต์ ํ(์ฒ๋ฆฌ๋ ๊ทน๋ํ ๋ฐ ๋น์ฉ ์ต์ํ)
Kubernetes ๊ธฐ๋ฐ ์ปจํ ์ด๋ ๋ฐ์ดํฐ ํ๊ฒฝ ์ ์ง ๋ณด์ ๋ฐ ๊ณ ๋ํ๋ก ๋ฐ์ดํฐ ์ํฌ๋ก๋์ ์์ ์ ์คํ ๋ณด์ฅ
AI ๋ฆฌ์์ฒ ๋ฐ ํ๋ซํผ ํ๊ณผ ํ์ ํ์ฌ ๊ณ ํ์ง ๋ฐ์ดํฐ๋ฅผ ํ์ต ๋ฐ ํ๊ฐ ํ์ดํ๋ผ์ธ์ผ๋ก ํจ์จ์ ์ผ๋ก ๊ณต๊ธ
Design and build high-performance, scalable data pipelines to support diverse AI and Machine Learning initiatives across the organization.
Architect and implement multi-region data infrastructure to ensure global data availability and seamless synchronization.
Develop flexible pipeline architectures that allow for complex branching and logic isolation to support multiple concurrent AI projects.
Optimize large-scale data processing workloads using Databricks and Spark to maximize throughput and minimize processing costs.
Maintain and evolve the containerized data environment on Kubernetes, ensuring robust and reliable execution of data workloads.
Collaborate with AI researchers and platform teams to streamline the flow of high-quality data into training and evaluation pipelines.
Qualifications
๋๊ท๋ชจ AI/ML ๋ฐ์ดํฐ์ ์ ์ํ ํ๋ก๋์ ๊ธ ๋ฐ์ดํฐ ํ์ดํ๋ผ์ธ ๊ตฌ์ถ ๋ฐ ์ด์ ๊ฒฝํ
Apache Spark ๋ฐ Databricks ์ํ๊ณ ๋ฑ ๋ถ์ฐ ์ฒ๋ฆฌ ํ๋ ์์ํฌ์ ๋ํ ๋์ ์๋ จ๋
Apache Airflow ๋ฑ ์ํฌํ๋ก์ฐ ์ค์ผ์คํธ๋ ์ด์ ๋๊ตฌ๋ฅผ ํ์ฉํ ๋ณต์กํ ์์กด์ฑ ๊ด๋ฆฌ ๋ฐ ์ค๋ฌด ๊ฒฝํ
Kubernetes ๋ฐ ์ปจํ ์ด๋ ๊ธฐ์ ์ ํ์ฉํ ๋ฐ์ดํฐ ์ฒ๋ฆฌ ์ปดํฌ๋ํธ ๋ฐฐํฌ ๋ฐ ํ์ฅ ๋ฅ๋ ฅ
Apache Kafka ๋ฑ ๋ถ์ฐ ๋ฉ์์ง ์์คํ ์ ํ์ฉํ ๊ณ ์ฒ๋ฆฌ๋ ๋ฐ์ดํฐ ์์ง ๋ฐ ์ด๋ฒคํธ ๊ธฐ๋ฐ ์ํคํ ์ฒ ์ดํด
Python์ ํ์ฉํ ์์คํ ๋ ๋ฒจ ์ต์ ํ ๋ฐ ์์ค ๋์ ํ๋ก๊ทธ๋๋ฐ ์ญ๋
๋ณด์๊ณผ ํ์ฅ์ฑ์ ๊ณ ๋ คํ ํด๋ผ์ฐ๋ ๋ค์ดํฐ๋ธ ์๋น์ค ๋ฐ ์ธํ๋ผ ๊ตฌ์ถ best practices์ ๋ํ ์ดํด
๋ณต์กํ๊ณ ๊ฑฐ๋ํ ์์คํ ์์ ๊ทผ๋ณธ ์์ธ์ ์ฐพ์ ํด๊ฒฐํ๋ ๋ ผ๋ฆฌ์ ์ธ ๋ฌธ์ ํด๊ฒฐ ๋ฅ๋ ฅ
๋ค์ํ ์ ๊ด ๋ถ์ ๋ฐ ํํธ๋์ ์ํํ๊ฒ ์ํตํ ์ ์๋ ์ปค๋ฎค๋์ผ์ด์ ์ญ๋
Extensive professional experience in building and operating production-grade data pipelines for massive-scale AI/ML datasets.
Strong proficiency in distributed processing frameworks, particularly Apache Spark and the Databricks ecosystem.
Deep hands-on experience with workflow orchestration tools like Apache Airflow for managing complex dependency graphs.
Solid understanding of Kubernetes and containerization for deploying and scaling data processing components.
Proficiency in distributed messaging systems such as Apache Kafka for high-throughput data ingestion and event-driven architectures.
Expert-level programming skills in Python for system-level optimizations.
Strong knowledge of cloud-native services and best practices for building secure and scalable data infrastructure.
Logical approach to problem-solving with the persistence to identify and resolve root causes in complex, large-scale systems.
Strong communication skills to effectively collaborate with cross-functional teams and external partners.
Preferred Qualifications
๊ธ๋ก๋ฒ ๋ฉํฐ ๋ฆฌ์ ํ์ดํ๋ผ์ธ ์ค๊ณ ๋ฐ ๊ตญ๊ฐ ๊ฐ ๋ฐ์ดํฐ ์ ์ก/์ง์ฐ ์๊ฐ(Latency) ์ด์ ํด๊ฒฐ ๊ฒฝํ
Ray ๋ฑ AI ์ํฌ๋ก๋๋ฅผ ์ํ ๋ถ์ฐ ์ปดํจํ ํ๋ ์์ํฌ ๊ตฌํ ๊ฒฝํ ๋๋ ๊น์ ๊ด์ฌ
Spark Streaming ๋๋ Flink๋ฅผ ์ด์ฉํ ์ค์๊ฐ/์ค์ค์๊ฐ(Near real-time) ํ์ดํ๋ผ์ธ ๊ตฌ์ถ ๊ฒฝํ
Terraform ๋ฑ Infrastructure as Code(IaC) ๋๊ตฌ๋ฅผ ํ์ฉํ ๋ณต์กํ ๋ฐ์ดํฐ ํ๊ฒฝ ๊ด๋ฆฌ ๊ฒฝํ
์ ์ฒด ML ์์ ์ฃผ๊ธฐ(MLOps) ๋ฐ ๋ฐ์ดํฐ ์ธํ๋ผ๊ฐ ๋ชจ๋ธ ์คํ๊ณผ ๋ฐฐํฌ๋ฅผ ์ง์ํ๋ ๋ฉ์ปค๋์ฆ์ ๋ํ ์ดํด
Experience in architecting global, multi-region data pipelines and solving challenges related to cross-border data transfer and latency.
Practical experience or a strong interest in implementing distributed computing frameworks like Ray for AI workloads.
Experience in building real-time or near-real-time pipelines using Spark Streaming or Flink.
Familiarity with Infrastructure as Code (IaC) tools such as Terraform to manage complex data environments.
Understanding of the end-to-end ML lifecycle (MLOps) and how data infrastructure supports model experimentation and deployment.
Interview Process
์๋ฅ์ ํ - ์ฝ๋ฉํ ์คํธ - ํ์๋ฉด์ (1์๊ฐ ๋ด์ธ) - ๋๋ฉด ํน์ ํ์๋ฉด์ (3์๊ฐ ๋ด์ธ) - ์ต์ข ํฉ๊ฒฉ
์ ํ์ ์ฐจ๋ ์ง๋ฌด๋ณ๋ก ๋ค๋ฅด๊ฒ ์ด์๋ ์ ์์ผ๋ฉฐ, ์ผ์ ๋ฐ ์ํฉ์ ๋ฐ๋ผ ๋ณ๋๋ ์ ์์ต๋๋ค.
์ ํ์ผ์ ๋ฐ ๊ฒฐ๊ณผ๋ ์ง์์์ ๋ฑ๋กํ์ ์ด๋ฉ์ผ๋ก ๊ฐ๋ณ ์๋ด๋๋ฆฝ๋๋ค.
Resume Screening - Coding Test - Virtual Interview (approximately 1 hour) - Onsite or Virtual Interview (approximately 3 hours) - Final Offer
Please note that the interview process may vary depending on the position and is subject to change based on scheduling and other circumstances.
Interview schedules and results will be communicated individually via the email address provided in your application.
Additional Information
๋ชจ๋ ์ ์ถํ์ผ์ PDF ์์์ผ๋ก ์ ๋ก๋๋ฅผ ๋ถํ๋๋ฆฝ๋๋ค.
๊ตญ๊ฐ๋ณดํ๋์์ ๋ฐ ์ทจ์ ๋ณดํธ๋์์๋ ๊ด๊ณ๋ฒ๋ น์ ๋ฐ๋ผ ์ฐ๋ํฉ๋๋ค.
์ฅ์ ์ธ ๊ณ ์ฉ์ด์ง ๋ฐ ์ง์ ์ฌํ๋ฒ์ ๋ฐ๋ผ ์ฅ์ ์ธ ๋ฑ๋ก์ฆ ์์ง์๋ฅผ ์ฐ๋ํฉ๋๋ค.
42dot์ ์๋ขฐํ์ง ์์ ์์นํ์ ์ด๋ ฅ์๋ฅผ ๋ฐ์ง ์์ผ๋ฉฐ, ์์ฒญํ์ง ์์ ์ด๋ ฅ์์ ๋ํด ์์๋ฃ๋ฅผ ์ง๋ถํ์ง ์์ต๋๋ค.
3๊ฐ์์ ์์ต๊ธฐ๊ฐ์ด ์ ์ฉ๋ ์ ์์ต๋๋ค.
Please upload all required documents in PDF format.
Veterans and applicants eligible for employment protection will receive preferential consideration in accordance with applicable laws and regulations.
In compliance with the Act on Employment Promotion and Vocational Rehabilitation for Persons with Disabilities, registered individuals with disabilities will receive preferential consideration.
42dot does not accept unsolicited resumes from search firms. We will not pay any fees for resumes submitted without prior agreement.
A 3-month probationary period may apply.
โป ์ง์ ์ ์๋ ๋ด์ฉ์ ๊ผญ ํ์ธํด ์ฃผ์ธ์.
โป Please make sure to review the information below before applying.
42dot์ด ์ผํ๋ ๋ฐฉ์, 42dot Way ๋ณด๋ฌ๊ฐ๊ธฐ โ
Learn more about how we work at 42dot, 42dot Way โ
42dot๋ง์ ์ ๋ฌด๋ชฐ์ ํ๋ก๊ทธ๋จ, Employee Engagement Program ๋ณด๋ฌ๊ฐ๊ธฐ โ
Explore 42dotโs unique Employee Engagement Program, Employee Engagement Program โ
Frequently Asked Questions
Is the salary disclosed for the Senior AI Data Pipeline Engineer position at 42dot?
Is the Senior AI Data Pipeline Engineer job at 42dot remote?
Is the Senior AI Data Pipeline Engineer role at 42dot full-time or part-time?
Which team or department does the Senior AI Data Pipeline Engineer at 42dot belong to?
How do I apply for the Senior AI Data Pipeline Engineer position at 42dot?
When was the Senior AI Data Pipeline Engineer job at 42dot posted?
You'll be redirected to 42dot's official application page on Ashby ATS.