Senior Service Engineering Manager - CTJ - Poly
Atlanta, GA 
Share
Posted 10 days ago
Job Description
OverviewMicrosoft has an exciting opportunity to join the Silver Infrastructure and Operations team in supporting our Secure Work Area operations. Our team manages the infrastructure and day to day operations required to enable Azure engineers the ability to work in isolated and highly regulated environments. Do you enjoy solving complex issues and have the ability to triage multiple critical events in a calm manner and communicate in an articulate and professional manner? Do you have a passion for living a Coach, Model, and Care approach to managing people to enable them to be their best authentic self? Then we welcome you to learn more about this opportunity and share how you can contribute to the successful delivery of our mission critical services. We are looking for a Senior Service Engineering Manager that understands systems and processes used by Windows, Azure, Linux, and Apple OS' and applications. We look for ways to automate processes and create tools to allow our team to scale in support of our growing facilities. We are responsible for meeting security compliance requirements, meeting service level agreements for escalations, partnering with other engineering groups in architecting solutions that enable our mission critical services to be highly available. This role will also require close interaction with other engineering teams, program managers, and contractors in supporting operations of our clouds. Microsoft's mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond.
Responsibilities People ManagementManagers deliver success through empowerment and accountability by modeling, coaching, and caring.Model - Live our culture; Embody our values; Practice our leadership principles.Coach - Define team objectives and outcomes; Enable success across boundaries; Help the team adapt and learn.Care - Attract and retain great people; Know each individual's capabilities and aspirations; Invest in the growth of others.Technical Knowledge and ExpertiseDevelops end-to-end expertise in service and/or system design, interactions between technology layers and components, functions of infrastructure, and dependencies at scale. Develops team's end-to-end technical expertise, regularly identifying skill gaps and raising the collective bar on the team's skill set in alignment with industry standards. Takes ownership of service design by driving efforts within an organization to identify, define, recommend, and build optimal configurations of technology solutions with consideration for cost management. Adjusts configurations and defines infrastructures to improve the availability, reliability, efficiency, observability, and/or performance of supported products and services. Leverages technical expertise to identify and design, deliver, and operate solutions across organizations. Drives reviews with the engineering teams that develop and/or manage services, identifying opportunities for efficiencies in operations and sharing learnings and recommendations across engineering teams working on related services within their organization.Guides teams to stay current in knowledge and expertise as the technology landscape evolves, maintaining awareness of industry norms. Uses knowledge to drive the adoption of new solutions across engineering teams working with related products within an organization. Makes expertise available to others through sharing, coaching, conferences, and other means to drive improvements across teams.Operational ExcellenceManages teams of engineers to implement reliable, scalable, and high-performance solutions across teams. Contribute to design documents. Own implementation and rollback plans. Maintain quality checklist and related documentation, unblocking as needed.Holds the team accountable for creating, monitoring, and taking action on telemetry data and provides guidance on telemetry analytics to better identify patterns that reveal errors and unexpected problems that are affecting the system availability, reliability, performance, and/or efficiency. Manages the development of scripts and/or automation across a team and leverages an understanding of solutions to define, develop, measure, track, change, and improve the quality of telemetry pipelines that support automated monitoring and incident response.Holds team accountable for participation in on-call rotations and manages teams of Service Engineers responding to incidents to identify the level of impact, troubleshoot issues, and deploy appropriate fixes to resolve root cause(s) and prevent incident recurrence across related products. Ensures that Service Engineers within their organization have the technical knowledge and resources required to respond to incidents and make difficult decisions based on business impact. Ensures relevant engineering teams, stakeholders, and leaders are alerted to customer impacting issues. Ensures major issues are escalated to other teams as needed. Ensures postmortems are conducted. Ensures key details related to incidents and their resolution are shared through post-mortem reports and regular review meetings. Provides clarity during incidents, helps determine impact and define the scope of severity, and facilitates development of incident response and resolution guidance.Holds team accountable for understanding and following prescriptive guidance for security, privacy, and compliance standards in alignment with direction from the business and technical experts. Develops team's compliance awareness by conducting training and disseminating relevant information. Guides team to identify patterns of violations and implement automations for prevention. Works with security, privacy, and compliance teams to identify and address relevant security, privacy, and compliance issues across teams.Collaboration and Knowledge SharingDrives collaboration across teams by promoting the open exchange of information, resolving issues within and beyond their immediate team, managing conflict and teamwork challenges, and removing barriers to enable teams to quickly shift priorities without losing productivity. Identifies and includes all stakeholders in decisions and represents their organization with partners, customers, and external stakeholders, maintaining active.engagement so issues can be resolved and mutual objectives are met. Ensures information is systematically and clearly communicated across teams.Facilitates sharing of insights and best practices that can be applied to improve development and operations across related sets of systems, platforms, and/or products. Continues to develop their understanding of insights and best practices through interactions with more experienced Service Engineers, members of product engineering teams, and other resources (e.g., conferences, brown bags, wikis, documentation). Mentors and coaches other engineers to help them identify and propose relevant solutions.Specialty ResponsibilitiesHolds teams accountable for managing crisis situations, including leveraging advanced technical expertise, judgment, and decision making to coordinate multiple work streams and resources in crisis situations to drive mitigation plan and resolve crisis by engaging necessary teams and escalating to appropriate stakeholders. Applies diagnostic expertise. Provides guidance to other engineers working to mitigate and resolve issues. Communicates customer impact and other relevant information with key stakeholders, leadership, and customers. Guides projects and programs to improve crisis response by creating standard practices for consistent response across engineering teams. Fosters increased stability. Reduces noise by adjusting telemetry and alarming. Influences key engineering stakeholders to adopt new standards and practices to broadly improve crisis and problem management. Ensures crisis incident managers are trained and equipped with necessary resources.*Contributes to developing processes and standards to address complex security issues, and provides guidance to others as needed. Leads team to identify, prioritize, and target solutions to complex security issues that may cause a negative impact on customers and partners. Creates and drives adoption of relevant mitigation strategies and holds team accountable for leveraging best practices and guidelines to address security issues. Facilitates communication of and adherence to security policies and procedures. Other * Embody our culture and values

 

Job Summary
Company
Start Date
As soon as possible
Employment Term and Type
Regular, Full Time
Required Experience
Open
Email this Job to Yourself or a Friend
Indicates required fields