Site Reliability Engineer 1 - Job at Microsoft in Redmond, WA

Site Reliability Engineer 1

Redmond, WA

Posted 10 days ago

Email Job

Job Description

OverviewCome join us in shaping the future of education! Would you like to be part of the team that is building the next gen apps and services to revolutionize the learning experience? Want to join one of the most customer focused organizations? Do you want to build an inclusive team culture? If so, then we might have just the right opportunity for you! The Education engineering team serves a mission critical vertical and acts as an incubation ground for M365 initiatives. We are creating unique experiences for students, teachers, administrators and parents and our goal is to empower every learner in the world to achieve more. We arelooking for a Site Reliability Engineer (SRE)with the right mix of systems engineering, software development, on-line servicesexperience,and passion for quality to envision, design, and deliver our highly scalable services that aim to serve millions of teachers and students across the globe. Our engineering culture is data driven, dynamic and inclusive. Team members are encouraged to explore innovative ideas, establish hypotheses, and implement them iteratively to learn and adapt quickly. We are passionate about pursuing service agility and deploy frequently to production. Our services are distributed RESTful APIs deployed in Azure and built on top of the Office 365 Substrate layer and SharePoint. Our UI experiences are built on top of ReactJs. Microsoft's mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyon
ResponsibilitiesDevelops code, scripts, systems, or platforms that automate moderately complex but repetitive operations processes (e.g., monitoring, alerting, deploying products and updates, debugging) at scale; reviews existing automation code and scripts to evaluate reusability, extendibility, and scalability within an organization. Analyzes data from telemetry pipelines and monitoring tools that detail operations metrics (e.g., availability, reliability, performance, efficiency) of systems, platforms, or productsoperatingat scale. Responds to incidents during regular on-call rotations by identifying the level of impact, troubleshooting complex issues, and deploying appropriate fixes to resolve root cause(s); alerts product teams, owners, and leadership to issues with major customer/business impact and escalates resolution of the overly complex, ambiguous, and impactful issues to include other engineering teams and/or subject matter experts as needed. Shares details related to incidents and their resolution through post-mortem reports and during regular review meetings with more experienced engineers and members of product engineering teams. Embody our Culture and Values

Job Summary

Company

Microsoft

Start Date

As soon as possible

Employment Term and Type

Regular, Full Time

Required Experience

Open