Back to all jobs
Microsoft logo

Senior Reliability Engineer

Microsoft

Redmond, Washington, U.S.
Full-Time
Posted Oct 19, 2025
3 days / week in-office

Compensation

Loading salary analysis...

About the role

The Firmware Deployment team within Microsoft’s Silicon Cloud Hardware Infrastructure Engineering (SCHIE) organization is responsible for building and operating world-class software and data-driven services that support Azure’s hardware infrastructure development.

Responsibilities

  • Build and bring specialized knowledge across multiple production aspects (monitoring, release engineering, testing, live site excellence, buildout, performance optimization, capacity management)
  • Analyze large-scale telemetry and operational data to uncover insights and drive data-informed decisions.
  • Use the proven set of principles and practices such as safe deployment, testing for reliability, single point of failures elimination, disaster recovery, SLOs based monitoring, throttling, infrastructure management automation, post-mortem excellence, and adoption of common systems
  • Respond to alerts and incidents.
  • Build and follow playbooks to drive root cause analysis and reviews
  • Partner with hardware and firmware teams to understand system behavior and identify opportunities for predictive analytics.
  • Participate in an on-call rotation and availability during non-standard business hours and contribute to service reliability and incident resolution.

Requirements

  • Master's Degree in Computer Science, Information Technology, or related field AND 2+ years technical experience in software engineering, network engineering, or systems administration OR Bachelor's Degree in Computer Science, Information Technology, or related field AND 4+ years technical experience in software engineering, network engineering, or systems administration
  • OR equivalent experience.
  • 3+ years of experience in software engineering or operations for large-scale distributed systems.
  • Ability to support a 24x7 data center environment, including participation in an on-call rotation and availability during non-standard business hours(evening, nights, weekends, or holidays) as operational needs require.
  • Proficiency in one or more programming languages (C#, Python, Go, or similar).
  • Understanding of cloud infrastructure (Azure preferred), networking, and system design.
  • Familiarity with monitoring tools, incident management frameworks, and DevOps practices.

Benefits

  • Industry leading healthcare
  • Educational resources
  • Discounts on products and services
  • Savings and investments
  • Maternity and paternity leave
  • Generous time away
  • Giving programs
  • Opportunities to network and connect

About the Company

Our mission is to enable safe, reliable, and intelligent deployment of firmware payloads across the Azure fleet, ensuring system health and operational quality at scale.

Job Details

Salary Range

Salary not disclosed

Location

Redmond, Washington, U.S.

Employment Type

Full-Time

Original Posting

View on company website
Create resume for this position