Infra Support Engineer
About this role
Infra Support Engineer – GMI Global Infrastructure Team
Preferred Location:
- Taiwan
- Malaysia
Responsibilities:
- Provide first and second-line technical support to customers for AI Infrastructure, including GPU/CPU nodes, networking, storage, orchestration, and platform services. Support is delivered via ticketing systems, emails, Slack, or other messaging platforms.
- Support GPU cluster delivery, including system provisioning, image deployment, network validation, BIOS/firmware updates, and GPU driver/runtime installation.
- Monitor system health and service-level indicators using alerts and dashboards; respond to alerts 24x7 as scheduled.
- Triage incidents by gathering context, verifying scope and impact, and following standard operating procedures and runbooks to perform immediate mitigations.
- Escalate incidents to global SRE engineers with clear, concise incident notes and relevant logs/traces.
- Maintain incident logs, update status pages, and communicate timely updates to stakeholders during incidents.
- Perform routine operational tasks such as log checks, health checks, capacity checks, and simple automated fixes.
- Participate in postmortems and contribute actionable follow-ups to reduce recurrence of incidents.
- Help maintain and improve standard operating procedures (SOP), run periodic runbook validation, and document new procedures.
- Work collaboratively with developers and SRE teams to improve system reliability.
Qualifications:
- Bachelor’s degree in Computer Science or a related field.
- Over 2 years of experience in IT operations, server administration, SRE, DevOps, or technical support.
- Hands-on Linux experience, including shell, kernel, and log management.
- Basic networking knowledge, including TCP/IP, DNS, HTTP, and VLANs.
- Familiarity with monitoring, alerting, and logging tools such as Prometheus, Grafana, and AlertManager.
- Experience with Nvidia GPU infrastructure and Kubernetes.
- Comfortable collecting diagnostics, reading logs, and interpreting traces.
- Strong troubleshooting mindset and ability to follow runbooks under pressure.
- Excellent written and verbal communication skills for customer-facing incident handling.
- Willingness to work shifts and participate in on-call rotations.
- Bilingual in English and Chinese.
Frequently Asked Questions
Is the salary disclosed for the Infra Support Engineer position at cVzy4Zi49yLx9EZznjyX2v?
Where is the Infra Support Engineer position at cVzy4Zi49yLx9EZznjyX2v located?
Is the Infra Support Engineer role at cVzy4Zi49yLx9EZznjyX2v full-time or part-time?
How do I apply for the Infra Support Engineer position at cVzy4Zi49yLx9EZznjyX2v?
When was the Infra Support Engineer job at cVzy4Zi49yLx9EZznjyX2v posted?
You'll be redirected to cVzy4Zi49yLx9EZznjyX2v's official application page on workable.