Staff Engineer, Site Reliability

learnupon· Engineering
Apply Now ↗
📍 Dublin

About this role

Work Mode: Flex 1+ days per week in our Dublin office

Department: Engineering

 

About the Company 

LearnUpon partners with over 1,600 organisations globally to unlock the potential of employees, customers & members through learning that’s easy, scalable and focused on results.

Read more about life at LearnUpon here

 

About the Team

Our Engineering organization is dedicated to building robust, scalable infrastructure that handles world-scale platform demands. As part of the Site Reliability Engineering (SRE) team, we focus on system architecture, absolute performance, and technical innovation. Operating with high ownership and technical expertise, we are responsible for the scale-out of the LearnUpon infrastructure, championing internal self-service tooling, and embedding a culture of observability and shared operational responsibility across all engineering squads.

 

About the Opportunity

As a Staff Site Reliability Engineer, you will be a principal technical leader and a key catalyst for our infrastructure's evolution. In this role, you will take ownership of our core platform resilience, driving the strategy to build out an advanced, cost-effective observability function spanning metrics, logs, and transaction tracking. This opportunity requires a strategic thinker who can design cross-team SLO/SLI frameworks, navigate complex distributed system requirements, and mentor talent to ensure LearnUpon scales efficiently to support our ambitious global goals.

In addition, you’ll be responsible for:

  • Infrastructure Optimization: Identify opportunities to improve and scale our infrastructure for performance, observability, maintainability, and cost, by creating innovative solutions.
  • Observability Function Strategy: Lead our efforts to build an observability function that incorporates application metrics, application transaction tracking, and event log management.
  • Resilience & Scaling: Drive the processes to maintain resilient, scalable, and cost-effective infrastructure while working with other Engineering teams to provide solutions that meet their ongoing requirements.
  • Tooling & Self-Service: Build tools focused on measuring, monitoring, and alerting, with an eye towards self-service in order to promote Engineers’ ownership of observability.
  • Operational Agility & Support: React quickly to changing customer and business needs and actively participate in the team's on-call rota. 
    Team Up-Leveling: Mentor junior talent and effectively communicate complex technical ideas to both technical and non-technical peers.

 

Skills & Experience 

Must-Haves                                                                         

  • 7+ years of experience in a software or Ops role.
  • 5+ years of cloud engineering experience, with at least 2 years of experience with AWS.
  • Experience deploying Microservice environments using containerisation technologies such as Kubernetes and Docker.
  • Experience designing and implementing Observability tech stacks, championing its benefits to Engineering teams, and managing the associated cost analysis of metrics gathering, effort, and tooling.
  • Ability to architect the design of SLO/SLI implementations that balance the needs of different teams.
  • Experience building and supporting large-scale distributed systems that back a consumer app or website with associated requirements of performance, security, and disaster recovery.
  • Experience with implementing IaC (e.g., CloudFormation, Terraform, etc.), automation tooling (e.g., Puppet, Ansible etc.), and CI/CD (e.g., Jenkins, Travis CI, GitLab, etc.).
  • Experience using AI tools to streamline tasks and improve efficiencies.

 

Nice-to-Haves

  • Experience with database scaling would be a strong plus.
  • Certification in AWS, any PaaS, and/or related technologies.

*If you don’t tick every box but believe this role is a mutually good fit, please don’t hesitate to apply. We’d love to hear from you.

 

Why choose LearnUpon?

From comprehensive rewards and generous time off to meaningful investment in your growth and development, LearnUpon gives you the support, trust, and opportunity to do the most impactful work of your career.

Learn more here

 

Hiring Process

  • Qualified applicants may be invited to an initial screening call with a member of our TA Team.
  • Successful candidates will be invited to a series of practical interviews.
  • Finally, candidates will have an interview with our CTO.
  • Successful candidates will be contacted with an offer to join our team.

 

Note: At LearnUpon, we utilise AI to enhance the speed and quality of our screening and assessment practices, but our hiring decisions are always human. 

 

If you need any accommodations during the hiring process, please reach out to us at peopleops@learnupon.com.

 

LearnUpon is an Equal Opportunities Employer. 

We do not discriminate on the basis of gender, marital status, family status, age disability, sexual orientation, race, religion, membership of the Traveller community, or any other legally protected status.

 

Check out our Careers site and Instagram to learn more about working at LearnUpon.

 

By submitting your application, you agree to LearnUpon's Privacy Policy






Frequently Asked Questions

Is the salary disclosed for the Staff Engineer, Site Reliability position at learnupon?
The salary for this Staff Engineer, Site Reliability role at learnupon is not publicly listed. Click "Apply Now" to learn more about the compensation package on their official careers page.
Where is the Staff Engineer, Site Reliability position at learnupon located?
This Staff Engineer, Site Reliability role at learnupon is based in Dublin. The position is listed as on-site or hybrid. Check the full job description or apply directly to confirm the work arrangement.
Which team or department does the Staff Engineer, Site Reliability at learnupon belong to?
This Staff Engineer, Site Reliability position is part of the Engineering department at learnupon. See the full job description for more information about the team structure and responsibilities.
How do I apply for the Staff Engineer, Site Reliability position at learnupon?
Click the "Apply Now" button on this page. You will be redirected to learnupon's official application portal hosted on greenhouse where you can submit your application directly.
When was the Staff Engineer, Site Reliability job at learnupon posted?
This Staff Engineer, Site Reliability position at learnupon was posted on Aug 27, 2025. Apply as soon as possible — early applications are often reviewed first.
Staff Engineer, Site Reliability
learnupon
Apply for this role ↗

You'll be redirected to learnupon's official application page on Greenhouse.