Software Engineer, Site Reliability Engineer

furiosa-ai· Software
Apply Now ↗
🌍 Remote📍 Seoul HQFullTime

About this role

About the Role

As a Site Reliability Engineer, you will apply software engineering to improve the reliability, scalability, security, and operability of FuriosaAI’s production infrastructure and customer-facing services. You will work across baremetal Kubernetes clusters, cloud control planes, networking, observability systems, deployment pipelines, and API services running on Furiosa NPUs.

We are looking for an engineer who can reason about production systems end-to-end, identify reliability risks across service and infrastructure boundaries, build the observability foundation required to understand them, and drive improvements through code, configuration, automation, and architectural changes.

In this role, your mission is defined by three primary pillars:

  • Reliability Architecture: Improve production systems so failures are isolated, degraded gracefully, detected quickly, and recovered safely.

  • Observability & SLOs: Build the metrics, logs, traces, dashboards, alerts, and service-level indicators required to understand user-facing reliability.

  • Production Engineering: Reduce operational toil through automation, self-service workflows, safer rollouts, and hands-on engineering contributions.

Responsibilities

  • Define and evolve reliability goals for production systems through SLIs, SLOs, error budgets, and meaningful operational metrics.

  • Design and build observability foundations that make system behavior, user impact, performance bottlenecks, and failure modes measurable and actionable.

  • Analyze production systems end-to-end, identify reliability risks across software, infrastructure, and networking boundaries, and drive architectural improvements.

  • Improve change safety and failure recovery through better rollout strategies, capacity planning, load validation, graceful degradation, and incident learning loops.

  • Reduce operational toil by building automation, internal tooling, and self-service workflows that make production systems easier to operate and harder to misuse.

Minimum Qualifications

  • Bachelor’s degree in Computer Science, Engineering, or a related field, or equivalent practical experience.

  • Strong programming skills in one or more general-purpose languages such as Rust, Python, , or Go.

  • Solid understanding of operating systems, computer networks, and cloud-native or container-based environments.

  • Ability to analyze technical problems and communicate clearly with engineering teams.

Preferred Qualifications

  • Experience improving reliability of production systems using SLOs, observability, incident analysis, rollout safety, and error-budget-driven decision making.

  • Experience designing or operating distributed systems where failures, overload, latency, and capacity limits must be explicitly managed.

  • Experience building automation, internal tooling, or self-service workflows that reduce operational toil and improve engineering productivity.

  • Experience working across software, infrastructure, networking, and security boundaries to diagnose problems and drive architectural improvements.

Contact

  • recruit@furiosa.ai

Frequently Asked Questions

Is the salary disclosed for the Software Engineer, Site Reliability Engineer position at furiosa-ai?
The salary for this Software Engineer, Site Reliability Engineer role at furiosa-ai is not publicly listed. Click "Apply Now" to learn more about the compensation package on their official careers page.
Is the Software Engineer, Site Reliability Engineer job at furiosa-ai remote?
Yes, this Software Engineer, Site Reliability Engineer position at furiosa-ai is remote, with team members based in Seoul HQ. You can work from home or anywhere in the supported regions.
Is the Software Engineer, Site Reliability Engineer role at furiosa-ai full-time or part-time?
This is listed as a FullTime position. It is posted as a Software Engineer, Site Reliability Engineer role in the Software department at furiosa-ai.
Which team or department does the Software Engineer, Site Reliability Engineer at furiosa-ai belong to?
This Software Engineer, Site Reliability Engineer position is part of the Software department at furiosa-ai. See the full job description for more information about the team structure and responsibilities.
How do I apply for the Software Engineer, Site Reliability Engineer position at furiosa-ai?
Click the "Apply Now" button on this page. You will be redirected to furiosa-ai's official application portal hosted on ashby where you can submit your application directly.
When was the Software Engineer, Site Reliability Engineer job at furiosa-ai posted?
This Software Engineer, Site Reliability Engineer position at furiosa-ai was posted on May 29, 2026. Apply as soon as possible — early applications are often reviewed first.
Software Engineer, Site Reliability Engineer
furiosa-ai
Apply for this role ↗

You'll be redirected to furiosa-ai's official application page on Ashby ATS.