Site Reliability Engineer (4024)

Gbg· Technology and Operations

📍 Kuala Lumpur, Federal Territory of Kuala Lumpur, MalaysiaFull time🗓 Posted May 29, 2026

About this role

Enabling safe and rewarding digital lives for genuine people, everywhere

We make it our mission to ensure more genuine people have digital access to opportunities, and businesses have access to more genuine people. Our technology draws on diverse and reliable data to create a single point of truth for identity and address verification.

With over 30 years of experience behind us our team and technology are focused on enabling safe and rewarding digital lives for everyone. Regardless of age, location or background, genuine people everywhere should be able to digitally prove who they are and where they live.

About the team and role

Global Fraud Solutions

The team provides decision support solutions to address business objectives in risk prevention and fraud detection. We deliver software solutions and offer client support using our expertise and a client-focused approach.

Site Reliability Engineer

The SRE will build and operate the reliability, observability, and operational excellence infrastructure underpinning the GFS managed fraud detection platforms. You will work across deployment pipelines, cloud infrastructure, monitoring, and incident management — ensuring GBG can deliver on high availability SLAs for banking and fintech customers who depend on real-time fraud detection at scale.

What you will do

Design and operate the SRE practice for Managed oferings, including on-call processes, SLA frameworks, incident response playbooks, and post-incident review (PIR) processes.
Build and maintain observability infrastructure: centralised logging (correlation IDs), metrics dashboards, distributed tracing, and alerting for the Predator/Instinct platform stack.
Define and track SLOs (Service Level Objectives) and error budgets for real-time transaction processing pipelines, targeting high TPS and low round-trip latency.
Manage cloud infrastructure provisioning and configuration using IaC tooling (Terraform, Helm), supporting both AWS/Azure cloud deployments and on-premises customer environments.
Implement and maintain CI/CD pipelines for GFS solutions (Jenkins, etc.)
Work with Engineering teams to ensure security and compliance readiness for Managed services — including PCI DSS, ISO 27001, SOC 1/2/3, PDPA/GDPR — in close coordination with InfoSec teams.
Drive platform resilience improvements: high availability, auto-scaling, disaster recovery, backup/restore procedures, and chaos engineering practices.
Manage secrets, certificate rotation, identity/access controls (OAuth/RBAC), and vulnerability management for the hosted environment.
Support performance testing methodology and baseline establishment for our products.
Contribute to the Architecture Review Committee (ARC) with SRE and operational perspectives on technology choices.
Collaborate with engineering squads to embed reliability and DevSecOps practices across the SDLC.

Skills we’re looking for

Minimum 5 years of solid hands-on experience in a Site Reliability, Platform Engineering, or DevOps role, ideally supporting mission-critical real-time processing systems in banking, payments, or fintech.
Strong proficiency with cloud platforms (AWS preferred; Azure/GCP acceptable) including networking, compute, storage, and managed services.
Deep expertise with containerisation and orchestration: Docker, Kubernetes (EKS/AKS/GKE), Helm, and associated tooling.
Infrastructure as Code experience: Terraform (required), and familiarity with Ansible or Pulumi.
Observability stack proficiency: Prometheus, Grafana, ELK/OpenSearch, Jaeger/Zipkin, or equivalent enterprise-grade tooling.
CI/CD pipeline design and management: GitHub Actions, Jenkins, ArgoCD, or equivalent.
Experience with security and compliance frameworks applicable to hosted financial services: PCI DSS, ISO 27001, SOC 1/2/3, GDPR/PDPA.
Familiarity with database reliability practices for SQL Server, PostgreSQL, and Oracle — including replication, read replicas, and failover.
Working knowledge of secrets management (HashiCorp Vault, AWS Secrets Manager) and zero-trust identity principles.
Experience supporting real-time streaming or event-driven architectures (Kafka, RisingWave, or similar) in production environments.
Scripting and automation proficiency: Python, Bash, or Go for operational tooling.
Strong sense of operational ownership and accountability — comfortable being on-call and driving incidents to resolution.
Excellent communication skills — able to produce clear incident reports, runbooks, and architecture documentation for both technical and executive audiences.
Proactive mindset: identifies reliability risks before they become incidents and champions a culture of blameless post-mortems.
Collaborative and effective working with software engineers, product managers, and InfoSec teams.
Continuous improvement orientation — always looking to reduce toil, automate repetitive tasks, and improve platform resilience.
Flexible and adaptable — able to support a globally distributed product with customers across multiple time zones.

To find out more

As an equal opportunity employer, we are dedicated to creating a diverse and inclusive workplace where everyone feels valued and empowered. Please inform your GBG Talent Attraction Partner if you require any reasonable adjustments to the interview process.

To chat to the Talent Attraction team and find out more about our benefits and why we’re a great place to work, drop an email to behired@gbgplc.com and we’ll be in touch. You can also find out more about careers at GBG and check out our current opportunities at gbgplc.com/careers.

Frequently Asked Questions

Is the salary disclosed for the Site Reliability Engineer (4024) position at Gbg?

The salary for this Site Reliability Engineer (4024) role at Gbg is not publicly listed. Click "Apply Now" to learn more about the compensation package on their official careers page.

Where is the Site Reliability Engineer (4024) position at Gbg located?

This Site Reliability Engineer (4024) role at Gbg is based in Kuala Lumpur, Federal Territory of Kuala Lumpur, Malaysia. The position is listed as on-site or hybrid. Check the full job description or apply directly to confirm the work arrangement.

Is the Site Reliability Engineer (4024) role at Gbg full-time or part-time?

This is listed as a Full time position. It is posted as a Site Reliability Engineer (4024) role in the Technology and Operations department at Gbg.

Which team or department does the Site Reliability Engineer (4024) at Gbg belong to?

This Site Reliability Engineer (4024) position is part of the Technology and Operations department at Gbg. See the full job description for more information about the team structure and responsibilities.

How do I apply for the Site Reliability Engineer (4024) position at Gbg?

Click the "Apply Now" button on this page. You will be redirected to Gbg's official application portal hosted on workable where you can submit your application directly.

When was the Site Reliability Engineer (4024) job at Gbg posted?

This Site Reliability Engineer (4024) position at Gbg was posted on May 29, 2026. Apply as soon as possible — early applications are often reviewed first.

Site Reliability Engineer (4024)

Gbg

Apply for this role ↗

You'll be redirected to Gbg's official application page on workable.