Senior SRE Engineer

kronosresearchยท Core Technology
Apply Now โ†—
๐Ÿ“ Taiwan

About this role

Responsibilities

Linux Systems & Automation (Core)

- Manage large-scale Linux environments: troubleshooting and root-cause analysis
- Write maintainable, hand-off-ready Bash / Ansible / Python automation
- On-call for infrastructure, CI/CD, and production service incidents

HPC Cluster & Storage

- Operate HPC clusters (Slurm) along with usage analytics, auditing, and monitoring tools
- Maintain and plan storage for compute environments (Lustre, NAS)

Cloud & Hybrid Infrastructure

- Manage multi-cloud environments (AWS, Alibaba Cloud, GCP) with Terraform / AWS CDK
- Build and operate Docker (ECS) / Kubernetes (EKS) environments and their deployment workflows

CI/CD & Developer Experience

- Operate self-hosted GitLab server and Runner fleet
- Operate CI/CD systems and design deployment pipelines for research and other projects

GenAI / Internal Platform

- Build internal AI platforms (LangChain / LangGraph / Bedrock, Elasticsearch RAG)
- Develop MCP servers, chatbots, AI agents, and similar services

Requirements

- **5+ years** of hands-on Linux systems administration and infrastructure operations experience
- Solid Linux internals knowledge (process / memory / filesystem / networking / systemd / cgroup); able to localize issues even without complete logs
- Strong Bash / Shell scripting skills โ€” able to write maintainable scripts that others can pick up
- Programming ability for data processing, CLI tools, and API services; Python proficiency preferred
- Solid storage fundamentals with hands-on experience: RAID levels and rebuild trade-offs, filesystem selection, snapshot and backup planning; NAS / shared storage (NFS / SMB) operations experience
- Experience with at least one major public cloud (AWS / GCP / Alibaba Cloud) and IaC tooling (Terraform / CDK / Ansible)
- Familiar with containerization and orchestration (Docker, Kubernetes)
- CI/CD pipeline design and operations experience (GitLab CI / Jenkins / Airflow)
- Able to own a cross-service subsystem end-to-end: design, implementation, documentation, handoff
- **Strong autonomy**: can drive a problem from discovery, root-cause investigation, decision-making, to delivery with minimal supervision; able to make judgment calls under incomplete information and proactively communicate progress, risks, and rationale
- **Self-directed**: doesn't wait for tickets โ€” identifies problems worth solving and prioritizes them independently

Nice to Have

- HPC scheduler experience (Slurm / PBS / LSF)
- Parallel filesystem operations experience (Lustre / GPFS / BeeGFS)
- Advanced Linux performance analysis (perf, eBPF, ftrace) and kernel parameter tuning
- DB operations experience (MySQL, ClickHouse)
- Low-latency network tuning and cross-datacenter link optimization
- LLM application development (LangChain, RAG, Agent, MCP)
- Self-managed Kubernetes experience (Kubespray, kubeadm)
- GPU server operations (single-node): NVIDIA driver / CUDA toolkit version management, `nvidia-smi` / DCGM monitoring, nvidia-container-toolkit integration, troubleshooting XID / ECC errors and thermal throttling
- Experience or familiarity with integrating GPU resources into Slurm: GRES configuration, cgroup-based GPU isolation, user/job-level resource limits

Frequently Asked Questions

Is the salary disclosed for the Senior SRE Engineer position at kronosresearch?
The salary for this Senior SRE Engineer role at kronosresearch is not publicly listed. Click "Apply Now" to learn more about the compensation package on their official careers page.
Where is the Senior SRE Engineer position at kronosresearch located?
This Senior SRE Engineer role at kronosresearch is based in Taiwan. The position is listed as on-site or hybrid. Check the full job description or apply directly to confirm the work arrangement.
Which team or department does the Senior SRE Engineer at kronosresearch belong to?
This Senior SRE Engineer position is part of the Core Technology department at kronosresearch. See the full job description for more information about the team structure and responsibilities.
How do I apply for the Senior SRE Engineer position at kronosresearch?
Click the "Apply Now" button on this page. You will be redirected to kronosresearch's official application portal hosted on greenhouse where you can submit your application directly.
When was the Senior SRE Engineer job at kronosresearch posted?
This Senior SRE Engineer position at kronosresearch was posted on Jun 5, 2026. Apply as soon as possible โ€” early applications are often reviewed first.
Senior SRE Engineer
kronosresearch
Apply for this role โ†—

You'll be redirected to kronosresearch's official application page on Greenhouse.