Back to all jobs

Service Engineer

Microsoft

Hyderabad, Telangana, India

Full-Time

Posted Oct 16, 2025

Up to 50% work from home

Compensation

Loading salary analysis...

About the role

Are you passionate about cloud computing, obsessed with customer experience, and driven to resolve complex issues under pressure? Do you thrive in high-stakes, live environments and want to play a pivotal role in ensuring the reliability of Microsoft’s cloud platform? If so, the Azure Customer Experience (CXP) team has the opportunity for you.

Responsibilities

Collaborate closely with Engineering/PM to ensure the availability, performance of Live Site and the satisfaction of our customers
Manage high-severity incidents (SEV0/SEV1/SEV2) across Azure services, serving as the single point of accountability to ensure rapid detection, triage, resolution, and customer communication
Act as the central authority during live site incidents, driving real-time decision-making and coordination across Engineering, Support, PM, Communications, and Field teams
Provide calm, decisive leadership in crisis situations, escalating as needed to senior leadership
Promote a customer-first culture by prioritizing availability, reliability, and platform trust in every response
Contribute in analyzing customer-impacting signals from telemetry, support cases, and feedback to identify root causes, drive incident reviews (RCAs/PIRs), and implement preventative service improvements
Contribute to Azure platform improvements by incorporating learnings from live site events and customer feedback, ensuring improved reliability, observability, and supportability
Collaborate closely with Engineering and Product teams to influence and implement service resiliency enhancements, auto-remediation tools, and customer-centric mitigation strategies
Identify and advocate for customer self-service capabilities, improved documentation, and scalable solutions that empower customers to resolve common issues independently
Contribute to the development and adoption of incident response playbooks, mitigation levers, and operational frameworks aligned to real-world support scenarios and strategic customer needs
Contribute to the design of next-generation architecture for cloud infrastructure services with a focus on reliability and strategic customer support outcomes
Build and maintain cross-functional partnerships, ensuring alignment across engineering, business, and support organizations
Be data-driven and results-focused, using metrics to evaluate incident response effectiveness and platform health
Apply engineering mindset to operational challenges, balancing agility, scalability, and technical quality in collaboration with peers
Demonstrate strong collaboration and results-focused execution under pressure while working closely with other teams

Requirements

5+ years’ proven expertise in mission-critical cloud operations, high-severity incident response, SRE, or large-scale systems engineering on hyperscale platforms like Azure, AWS, or GCP
Exceptional command-and-control communication skills—able to drive clarity and direction with customers - internal Microsoft stakeholders and third-party vendors during ambiguity and chaos
Deep understanding of cloud architecture patterns, microservices, and containerization
Demonstrated ability to make decisions quickly, under pressure, and with limited data—without compromising long-term reliability
Familiarity with monitoring and observability tools (e.g., Grafana, Prometheus, Datadog, Splunk, New Relic)
Contribute to implement observability frameworks to proactively detect performance bottlenecks
Strong knowledge of CI/CD pipelines, container orchestration (Kubernetes, Docker), and infrastructure as code (Terraform, ARM, Bicep)
Familiarity with AI/ML frameworks and cloud AI services
Experience implementing AI-driven monitoring, alerting, and remediation systems
Fluency in one or more automation languages (PowerShell, Python, CLI, etc.)
Understanding ITIL or other incident management frameworks is a must
Understand High Availability, Disaster Recovery, Business Continuity, Performance Tuning
Demonstrates strategic thinking, quantitative and analytical skills, team leadership, and collaboration
Excellent problem resolution, judgment, negotiating and decision-making skills
Strong knowledge of Windows Platform or Linux, developer tools and ability to diagnose and debug user code
Effectively manage and prioritize multiple tasks in accordance with high-level objectives/projects
Excellent communication skill (written + verbal) in English, especially in high-pressure scenarios
Ability to communicate with a variety of audiences; including high-profile customers, executive management, and engineering teams
Experience with Azure, AWS, or GCP core services and their interdependence
Bachelor’s or master’s degree in computer science, Information Technology, or equivalent experience

Benefits

Industry leading healthcare
Educational resources
Discounts on products and services
Savings and investments
Maternity and paternity leave
Generous time away
Giving programs

About the Company

Microsoft Azure is one of the most exciting and strategic products at Microsoft—powering mission-critical workloads for enterprises, governments, and startups around the world. Azure delivers on-demand, hyper-scale infrastructure and platforms via Microsoft's global data centers, enabling customers to build, host, and scale their applications with confidence.

Job Details

Salary Range

Salary not disclosed

Location

Hyderabad, Telangana, India

Employment Type

Full-Time

Original Posting

View on company website

Create resume for this position