Incident Response Analyst II

astreyaยท Astreya Asia Pacific Pte Ltd
Apply Now โ†—
Full timeAstreya Asia Pacific Pte Ltd

About this role

Key Responsibilities

1. Real-Time Infrastructure Monitoring

  • Perform 24x7 monitoring of critical facility systems across global data centers, including:
    • Electrical power systems
    • Mechanical systems
    • HVAC and cooling infrastructure
    • Fire detection and suppression systems
    • Water systems and supporting infrastructure
  • Continuously monitor EPMS, BMS, DCIM, and centralized monitoring platforms.
  • Detect abnormal operating conditions and alarms.
  • Acknowledge and investigate alarms promptly.
  • Track incidents and issues through to closure.
  • Identify monitoring gaps and recommend improvements to monitoring coverage.


2. Incident Response and Coordination

  • Provide first-level incident triage and technical assessment.
  • Respond to facility alarms and operational events in real time.
  • Execute escalation procedures according to defined protocols.
  • Coordinate with internal teams, site personnel, vendors, and regional stakeholders to ensure timely issue resolution.
  • Support major incident management activities for events such as:
    • Utility power failures
    • UPS and generator events
    • Cooling/HVAC failures
    • Fire alarm activations
    • Water leakage events
    • Security and environmental alerts
  • Maintain end-to-end ownership of incidents until resolution.


3. Ticket Management and Change Coordination

  • Create, update, and manage event tickets within established SLA targets.
  • Process work orders and monitor completion quality.
  • Track maintenance activities and change requests.
  • Support change management processes and ensure operational compliance.
  • Maintain accurate records of facility maintenance activities and change windows.


4. Compliance and Operational Governance

  • Monitor and follow up on preventive maintenance activities and routine operational changes.
  • Review technical documentation submitted by vendors and service providers, including:
    • Method of Procedure (MOP)
    • Risk Assessment (RA)
    • Standard Operating Procedure (SOP)
  • Ensure maintenance activities comply with operational standards and freeze-period requirements.
  • Support risk management and operational audit activities.


5. Monitoring Platform and Data Administration

  • Maintain monitoring platform master data and infrastructure records.
  • Ensure the accuracy, completeness, and timeliness of asset and alarm information.
  • Support platform optimization and continuous improvement initiatives.
  • Maintain facility logs, event records, and operational documentation.


6. Reporting and Data Analysis

  • Analyze facility operational data and identify trends or recurring issues.
  • Prepare operational reports and performance summaries.
  • Provide recommendations to improve reliability and operational efficiency.
  • Maintain records required for audit, compliance, and management reporting.


7. Operational Support and Continuous Improvement

  • Participate in after-hours support and emergency escalations.
  • Provide remote support for overseas data center operations when required.
  • Support centralized cross-regional operations and collaboration.
  • Contribute to process improvements and monitoring platform enhancements.
  • Perform other duties as assigned to support business continuity and operational excellence.


Minimum Qualifications

  • Associate Degree, Diploma, or higher in Engineering, Information Technology, Facilities Management, or related disciplines.
  • Minimum 2 years of experience in data center operations, facility monitoring, NOC, command center, or mission-critical environments.
  • Working knowledge of:
    • Electrical systems
    • Mechanical systems
    • HVAC and cooling infrastructure
    • Fire detection and suppression systems
    • Building Management Systems (BMS)
    • Electrical Power Monitoring Systems (EPMS)
    • DCIM or centralized monitoring platforms
  • Experience working with incident management and escalation procedures.
  • Strong communication and coordination skills.
  • Ability to work in a 24x7 rotating shift environment.
  • Ability to manage multiple priorities in high-pressure situations.
  • Fluent in English.
  • Chinese language proficiency (reading, writing, and verbal communication) is preferred to support Chinese alarm messages, documentation, and communications.


Preferred Qualifications

  • Experience in:
    • Network Operations Center (NOC)
    • Facility Operations Center (FOC)
    • Data Center Operations
    • Critical Environment Operations
    • Mission Critical Facilities
  • Experience supporting global or cross-regional operations.
  • Familiarity with structured incident, change, and problem management processes.
  • Understanding of data center capacity management (space, power, cooling).
  • Experience working with CMMS, DCIM, EPMS, BMS, or ticketing platforms.
  • Ability to perform root cause analysis and drive issue resolution.


Desired Competencies

  • Strong sense of ownership and urgency.
  • Excellent communication and stakeholder management skills.
  • Detail-oriented with strong documentation practices.
  • Analytical and problem-solving mindset.
  • Ability to learn quickly and adapt to changing operational environments.
  • Team-oriented with a proactive and customer-focused attitude.


Preferred Certifications

Candidates with the following certifications will have an advantage:

  • CDCP โ€“ Certified Data Centre Professional
  • CDCS โ€“ Certified Data Centre Specialist
  • FSM โ€“ Facilities Systems Management
  • Uptime Institute ATD
  • ITIL Foundation
  • DCCA or DCT certifications
  • Electrical or Mechanical engineering certifications


Shift Requirements

  • Must be willing to work a 24x7 rotating shift schedule.
  • Participate in weekends, public holidays, and on-call duty rotations when required.
  • Support emergency response activities and major incidents.


Key Performance Indicators (KPIs)

The successful candidate is expected to consistently achieve:

  • 100% shift attendance and handover compliance.
  • 24x7 continuous monitoring coverage.
  • Alarm acknowledgement within 1 minute.
  • Immediate notification generation within 2 minutes.
  • Event ticket creation within 10 minutes.
  • Compliance with escalation and incident management SLAs.
  • Zero service-impacting human errors.
  • Accurate documentation and reporting.
  • Continuous improvement contributions to operational processes and monitoring platforms.

Frequently Asked Questions

Is the salary disclosed for the Incident Response Analyst II position at astreya?
The salary for this Incident Response Analyst II role at astreya is not publicly listed. Click "Apply Now" to learn more about the compensation package on their official careers page.
Where is the Incident Response Analyst II position at astreya located?
This Incident Response Analyst II role at astreya is based in Singapore, Singapore. The position is listed as on-site or hybrid. Check the full job description or apply directly to confirm the work arrangement.
Is the Incident Response Analyst II role at astreya full-time or part-time?
This is listed as a Full time position. It is posted as a Incident Response Analyst II role in the Astreya Asia Pacific Pte Ltd department at astreya.
Which team or department does the Incident Response Analyst II at astreya belong to?
This Incident Response Analyst II position is part of the Astreya Asia Pacific Pte Ltd department at astreya. See the full job description for more information about the team structure and responsibilities.
How do I apply for the Incident Response Analyst II position at astreya?
Click the "Apply Now" button on this page. You will be redirected to astreya's official application portal hosted on workday where you can submit your application directly.
Incident Response Analyst II
astreya
Apply for this role โ†—

You'll be redirected to astreya's official application page on Workday.