LLM Engineer (LLM Evaluation)

42dotยท ENGINEERING
Apply Now โ†—
๐ŸŒ Remote๐Ÿ“ Pangyo (Software Dream Center), South KoreaFullTime

About this role

We are looking for the best

About Us

42dot์€ ์†Œํ”„ํŠธ์›จ์–ด์™€ AI๋กœ ๋ชจ๋นŒ๋ฆฌํ‹ฐ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ๋…ธ๋ ฅํ•˜๋Š” ๋ชจ๋นŒ๋ฆฌํ‹ฐ AI ๊ธฐ์—…์ž…๋‹ˆ๋‹ค. ํ˜„๋Œ€์ž๋™์ฐจ๊ทธ๋ฃน ๊ธ€๋กœ๋ฒŒ ์†Œํ”„ํŠธ์›จ์–ด ์„ผํ„ฐ๋กœ์„œ, 42dot์€ ์†Œํ”„ํŠธ์›จ์–ด ์ •์˜ ์ฐจ๋Ÿ‰ ๊ฐœ๋ฐœ์„ ์„ ๋„ํ•˜๋ฉฐ ๋ฏธ๋ž˜ ๋ชจ๋นŒ๋ฆฌํ‹ฐ๋ฅผ ๊ฐœ์ฒ™ํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.

LLM Engineer (LLM Evaluation)๋Š” ๋Œ€๊ทœ๋ชจ ์–ธ์–ด ๋ชจ๋ธ(LLM)์˜ ์„ฑ๋Šฅ์„ ์‹ ๋ขฐ์„ฑ ์žˆ๊ฒŒ ํ‰๊ฐ€ํ•˜๊ณ , ํ‰๊ฐ€ ๊ฒฐ๊ณผ๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ๋ชจ๋ธ ํ’ˆ์งˆ์„ ์ง€์†์ ์œผ๋กœ ๊ฐœ์„ ํ•  ์ˆ˜ ์žˆ๋Š” ํ‰๊ฐ€ ์ฒด๊ณ„์™€ ํ”Œ๋žซํผ์„ ๊ตฌ์ถ•ํ•ฉ๋‹ˆ๋‹ค.

๋น ๋ฅด๊ฒŒ ๋ณ€ํ™”ํ•˜๋Š” LLM ํ™˜๊ฒฝ ์†์—์„œ benchmark dataset, evaluation protocol, automation pipeline์„ ์„ค๊ณ„ํ•˜์—ฌ ๋ชจ๋ธ์˜ ํ’ˆ์งˆ๊ณผ ์•ˆ์ •์„ฑ์„ ์ง€์†์ ์œผ๋กœ ํ–ฅ์ƒ์‹œํ‚ค๊ณ , ์‹ค์„œ๋น„์Šค ์ˆ˜์ค€์˜ ๊ฒ€์ฆ ์ฒด๊ณ„๋ฅผ ์šด์˜ํ•˜๋Š” ๋ฐ ๊ธฐ์—ฌํ•ฉ๋‹ˆ๋‹ค.

๋˜ํ•œ Kubernetes ๊ธฐ๋ฐ˜ ํ™˜๊ฒฝ์—์„œ Argo Workflows ๋ฐ MLflow๋ฅผ ํ™œ์šฉํ•˜์—ฌ ๋ชจ๋ธ ํ‰๊ฐ€โ€“์‹คํ—˜ ๊ด€๋ฆฌโ€“๋ฐฐํฌ ๊ฒ€์ฆ๊นŒ์ง€ ์ด์–ด์ง€๋Š” end-to-end evaluation workflow๋ฅผ ๊ตฌ์ถ•ํ•˜๊ณ , ๋ฐ˜๋ณต ๊ฐ€๋Šฅํ•˜๊ณ  ์žฌํ˜„์„ฑ ์žˆ๋Š” ํ‰๊ฐ€ ํ™˜๊ฒฝ์„ ๊ณ ๋„ํ™”ํ•ฉ๋‹ˆ๋‹ค.

Responsibilities

  • LLM Evaluation & Benchmark ์„ค๊ณ„

    • LLM ์„ฑ๋Šฅ ํ‰๊ฐ€๋ฅผ ์œ„ํ•œ ๋ฒค์น˜๋งˆํฌ ๋ฐ์ดํ„ฐ์…‹ ๊ตฌ์ถ• ๋ฐ ํ‰๊ฐ€ ์ง€ํ‘œ(Human/LLM-based) ์„ค๊ณ„

    • ๊ณต์ •ํ•œ ๋ชจ๋ธ ๋น„๊ต๋ฅผ ์œ„ํ•œ ํ‰๊ฐ€ ํ”„๋กœํ† ์ฝœ ์ˆ˜๋ฆฝ ๋ฐ ์žฌํ˜„์„ฑ(Reproducibility) ํ™•๋ณด

  • Evaluation Automation ๋ฐ Workflow ์—ฐ๋™

    • Argo Workflows, MLflow ๊ธฐ๋ฐ˜์˜ ํ‰๊ฐ€ ์ž๋™ํ™” ํ™˜๊ฒฝ ๊ตฌ์ถ• ๋ฐ ML ํŒŒ์ดํ”„๋ผ์ธ ํ†ตํ•ฉ

    • ๋ชจ๋ธ ๋ฐฐํฌ ์‹œ ์„ฑ๋Šฅ ์ €ํ•˜(Regression) ์ž๋™ ๊ฐ์ง€ ๋ฐ ์•Œ๋ฆผ ์ฒด๊ณ„ ์„ค๊ณ„

  • Model Quality Validation ๋ฐ ์šด์˜ ๊ณ ๋„ํ™”

    • ๋ฐ˜๋ณต ๊ฐ€๋Šฅํ•œ ํ‰๊ฐ€ ์›Œํฌํ”Œ๋กœ์šฐ๋ฅผ ํ†ตํ•œ ๋Œ€๊ทœ๋ชจ ๋ชจ๋ธ ํ’ˆ์งˆ ๋ฐ ์•ˆ์ •์„ฑ ๊ฒ€์ฆ

    • ํ‰๊ฐ€ ๊ฒฐ๊ณผ๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ ํ•œ ์ง€์†์ ์ธ ๋ชจ๋ธ ํ’ˆ์งˆ ๊ฐœ์„  ํ”„๋กœ์„ธ์Šค ์šด์˜

Qualifications

  • LLM ํ•™์Šต ๋˜๋Š” ํ‰๊ฐ€ ๊ด€๋ จ ๋ถ„์•ผ 3๋…„ ์ด์ƒ ๊ฒฝ๋ ฅ

  • Deep Learning ๋˜๋Š” NLP ๊ด€๋ จ ์—ฐ๊ตฌ ๋ฐ ๊ฐœ๋ฐœ ๊ฒฝํ—˜

  • LLM evaluation framework ์‚ฌ์šฉ ๊ฒฝํ—˜ (lm-eval, HELM, OpenAI Evals ๋“ฑ)

  • Python ๊ธฐ๋ฐ˜ ์„œ๋น„์Šค ๊ฐœ๋ฐœ ๊ฒฝํ—˜ (async/๋น„๋™๊ธฐ ์ฒ˜๋ฆฌ ํฌํ•จ)

  • ์‹คํ—˜ ๊ด€๋ฆฌ ๋ฐ reproducibility์— ๋Œ€ํ•œ ์ดํ•ด

  • ๋ชจ๋ธ ํ‰๊ฐ€ ๋ฐ validation workflow ์„ค๊ณ„ ๊ฒฝํ—˜

  • ๋™๋ฃŒ์™€์˜ ์›ํ™œํ•œ ํ˜‘์—… ๋Šฅ๋ ฅ

Preferred Qualifications

  • Kubernetes ๋ฐ ์ปจํ…Œ์ด๋„ˆ ๊ธฐ๋ฐ˜ ํ™˜๊ฒฝ ๊ฐœ๋ฐœ ๊ฒฝํ—˜

  • ๋Œ€๊ทœ๋ชจ ๋ฐ์ดํ„ฐ ์ฒ˜๋ฆฌ ๋˜๋Š” pipeline ์„ค๊ณ„ ๊ฒฝํ—˜

  • GPU ๊ธฐ๋ฐ˜ ๋ถ„์‚ฐ inference ๋˜๋Š” ๋Œ€๊ทœ๋ชจ ๋ชจ๋ธ ํ‰๊ฐ€ ๊ฒฝํ—˜

  • Datadog, Prometheus ๋“ฑ์„ ํ™œ์šฉํ•œ ๋ชจ๋‹ˆํ„ฐ๋ง ๊ตฌ์ถ• ๊ฒฝํ—˜

  • MLflow, Argo Workflows ๊ธฐ๋ฐ˜ ML workflow ์šด์˜ ๊ฒฝํ—˜

  • GPU ํด๋Ÿฌ์Šคํ„ฐ ๊ธฐ๋ฐ˜ evaluation pipeline ์„ค๊ณ„ ๋ฐ ์šด์˜ ๊ฒฝํ—˜

  • LLM ํ’ˆ์งˆ ํ‰๊ฐ€ ์ž๋™ํ™” ๋ฐ ์šด์˜ ๊ฒฝํ—˜

Interview Process

  • ์„œ๋ฅ˜์ „ํ˜• - ์ฝ”๋”ฉํ…Œ์ŠคํŠธ - ํ™”์ƒ๋ฉด์ ‘ (1์‹œ๊ฐ„ ๋‚ด์™ธ) - ๋Œ€๋ฉด ํ˜น์€ ํ™”์ƒ๋ฉด์ ‘ (3์‹œ๊ฐ„ ๋‚ด์™ธ) - ์ตœ์ข…ํ•ฉ๊ฒฉ

  • ์ „ํ˜•์ ˆ์ฐจ๋Š” ์ง๋ฌด๋ณ„๋กœ ๋‹ค๋ฅด๊ฒŒ ์šด์˜๋  ์ˆ˜ ์žˆ์œผ๋ฉฐ, ์ผ์ • ๋ฐ ์ƒํ™ฉ์— ๋”ฐ๋ผ ๋ณ€๋™๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

  • ์ „ํ˜•์ผ์ • ๋ฐ ๊ฒฐ๊ณผ๋Š” ์ง€์›์„œ์— ๋“ฑ๋กํ•˜์‹  ์ด๋ฉ”์ผ๋กœ ๊ฐœ๋ณ„ ์•ˆ๋‚ด๋“œ๋ฆฝ๋‹ˆ๋‹ค.

Additional Information

  • ์ด๋ ฅ์„œ ์ œ์ถœ ์‹œ ์ฃผ๋ฏผ๋“ฑ๋ก๋ฒˆํ˜ธ, ๊ฐ€์กฑ๊ด€๊ณ„,ย ํ˜ผ์ธ ์—ฌ๋ถ€,ย ์—ฐ๋ด‰, ์‚ฌ์ง„, ์‹ ์ฒด์กฐ๊ฑด,ย ์ถœ์‹  ์ง€์—ญย ๋“ฑ ์ฑ„์šฉ์ ˆ์ฐจ๋ฒ•์ƒ ์š”๊ตฌ ๊ธˆ์ง€๋œ ์ •๋ณด๋Š” ์ œ์™ธ ๋ถ€ํƒ๋“œ๋ฆฝ๋‹ˆ๋‹ค.

  • ๋ชจ๋“ ย ์ œ์ถœ ํŒŒ์ผ์€ย 30MB ์ดํ•˜์˜ PDF ์–‘์‹์œผ๋กœ ์—…๋กœ๋“œ๋ฅผ ๋ถ€ํƒ๋“œ๋ฆฝ๋‹ˆ๋‹ค. (์ด๋ ฅ์„œย ์—…๋กœ๋“œ ์ค‘ย ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ•œ๋‹ค๋ฉด ์ง€์›ํ•˜์‹œ๊ณ ์ž ํ•˜๋Š” ํฌ์ง€์…˜์˜ URL๊ณผ ํ•จ๊ป˜ ์ด๋ ฅ์„œ๋ฅผย recruit@42dot.ai์œผ๋กœย ์ „์†ก ๋ถ€ํƒ๋“œ๋ฆฝ๋‹ˆ๋‹ค.)

  • ์ธํ„ฐ๋ทฐ ํ”„๋กœ์„ธ์Šค ์ข…๋ฃŒ ํ›„ ์ง€์›์ž์˜ย ๋™์˜ํ•˜์—ย ํ‰ํŒ์กฐํšŒ๊ฐ€ ์ง„ํ–‰๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

  • ๊ตญ๊ฐ€๋ณดํ›ˆ๋Œ€์ƒ์ž ๋ฐย ์ทจ์—…๋ณดํ˜ธ ๋Œ€์ƒ์ž๋Š”ย ๊ด€๊ณ„๋ฒ•๋ น์— ๋”ฐ๋ผ ์šฐ๋Œ€ํ•ฉ๋‹ˆ๋‹ค.

  • ์žฅ์• ์ธย ๊ณ ์šฉ ์ด‰์ง„ย ๋ฐ ์ง์—…์žฌํ™œ๋ฒ•์— ๋”ฐ๋ผ ์žฅ์• ์ธ ๋“ฑ๋ก์ฆ ์†Œ์ง€์ž๋ฅผ ์šฐ๋Œ€ํ•ฉ๋‹ˆ๋‹ค.

  • 42dot์€ ์˜๋ขฐํ•˜์ง€ ์•Š์€ ์„œ์น˜ํŽŒ์˜ ์ด๋ ฅ์„œ๋ฅผ ๋ฐ›์ง€ ์•Š์œผ๋ฉฐ, ์š”์ฒญํ•˜์ง€ ์•Š์€ ์ด๋ ฅ์„œ์— ๋Œ€ํ•ด ์ˆ˜์ˆ˜๋ฃŒ๋ฅผ ์ง€๋ถˆํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค.

  • 3๊ฐœ์›”์˜ ์ˆ˜์Šต๊ธฐ๊ฐ„์ด ์ ์šฉ๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

โ€ป ์ง€์› ์ „ ์•„๋ž˜ ๋‚ด์šฉ์„ ๊ผญ ํ™•์ธํ•ด ์ฃผ์„ธ์š”.

Frequently Asked Questions

Is the salary disclosed for the LLM Engineer (LLM Evaluation) position at 42dot?
The salary for this LLM Engineer (LLM Evaluation) role at 42dot is not publicly listed. Click "Apply Now" to learn more about the compensation package on their official careers page.
Is the LLM Engineer (LLM Evaluation) job at 42dot remote?
Yes, this LLM Engineer (LLM Evaluation) position at 42dot is remote, with team members based in Pangyo (Software Dream Center), South Korea. You can work from home or anywhere in the supported regions.
Is the LLM Engineer (LLM Evaluation) role at 42dot full-time or part-time?
This is listed as a FullTime position. It is posted as a LLM Engineer (LLM Evaluation) role in the ENGINEERING department at 42dot.
Which team or department does the LLM Engineer (LLM Evaluation) at 42dot belong to?
This LLM Engineer (LLM Evaluation) position is part of the ENGINEERING department at 42dot. See the full job description for more information about the team structure and responsibilities.
How do I apply for the LLM Engineer (LLM Evaluation) position at 42dot?
Click the "Apply Now" button on this page. You will be redirected to 42dot's official application portal hosted on ashby where you can submit your application directly.
When was the LLM Engineer (LLM Evaluation) job at 42dot posted?
This LLM Engineer (LLM Evaluation) position at 42dot was posted on May 21, 2026. Apply as soon as possible โ€” early applications are often reviewed first.
LLM Engineer (LLM Evaluation)
42dot
Apply for this role โ†—

You'll be redirected to 42dot's official application page on Ashby ATS.