Back to all jobs
Chubb Business Services Malaysia Sdn Bhd logo

SRE Lead

Chubb Business Services Malaysia Sdn Bhd

Kuala Lumpur, Selangor, Malaysia
Full-time, Regular
Posted Oct 19, 2025
Regular

Compensation

Loading salary analysis...

About the role

Our Platforms Team is at the forefront of innovation, creating technology solutions that empower multiple business lines across the organization. We are looking for a senior SRE to be supporting our applications deployed across the globe.

Responsibilities

Responsibilities not listed.

Requirements

  • Strong knowledge of Linux/Unix systems and networking.
  • Proficiency in programming languages such as Python, Ansible, PowerShell, .Net, Java.
  • Experience with cloud platforms (e.g., Azure, AWS).
  • Familiarity with containerization and orchestration tools (e.g., Docker, Kubernetes).
  • Expertise in monitoring and observability tools (e.g., App Dynamics, App Insights, Dynatrace, Grafana, ELK stack).
  • Understanding of CI/CD pipelines and automation frameworks.
  • Problem-solving skills and ability to perform root cause analysis.
  • Excellent communication and collaboration skills.
  • Experience with distributed systems and microservices architecture.
  • Knowledge of database systems (SQL and NoSQL).
  • Familiarity with incident management frameworks (e.g., ITIL, SRE best practices).
  • Certifications in cloud technologies or DevOps tools.
  • Analytical mindset with a focus on reliability and scalability.
  • Passion for automation and reducing manual work.
  • Ability to work under pressure and handle critical incidents effectively.
  • Commitment to continuous learning and staying updated on industry trends.

Benefits

  • Monitor system performance and proactively address bottlenecks or issues.
  • Implement strategies to improve system uptime and reduce downtime.
  • Develop and maintain automation tools for deployment, monitoring, and incident response.
  • Create scripts and workflows to reduce manual intervention and improve efficiency.
  • Respond to system outages and incidents, performing root cause analysis and implementing fixes.
  • Develop and maintain runbooks and documentation for incident response.
  • Set up and maintain monitoring tools to track system health and performance.
  • Define and measure Service Level Indicators (SLIs) and Service Level Objectives (SLOs).
  • Work closely with development teams to ensure systems are designed with reliability in mind.
  • Collaborate with operations teams to improve deployment processes and system management.
  • Analyze system usage and plan for future capacity needs.
  • Implement solutions to handle traffic spikes and ensure scalability.
  • Identify areas for improvement in system architecture and processes.
  • Advocate for best practices in reliability engineering and DevOps.

About the Company

Our Culture

Job Details

Salary Range

Salary not disclosed

Location

Kuala Lumpur, Selangor, Malaysia

Employment Type

Full-time, Regular

Original Posting

View on company website
Create resume for this position