Head of Site Reliability Management

3 months ago


Waterloo, Canada Airbus Full time

**Job Summary**:
NAVBLUE, an Airbus Company, is currently seeking a Head of SRM to join our growing team. This position is responsible for providing leadership to the Platform Reliability Team. The Platform Reliability Team consists of Platform Administrators and Platform Architects who collectively provide tools and services which enable Customer Experience and Product Development teams to support their services in a standardized manner.

The team supports Product Development by developing infrastructure, focusing on automation for manual actions needed to run and deploy a service, monitoring platform and service reliability, and providing feedback on development architecture. The team supports Customer Experience by providing tools to monitor services, correct in field issues, and collect data needed to perform root cause analysis.

**Responsibilities**:
This position requires a strong leader and expertise, capable of collaborating with the HO SRM Technology to set vision and direction for the platform which can deliver world class operational excellence and supporting and developing a team which can focus on services that scale with high reliability.

The team’s responsibility spans both legacy systems and new cloud deployments, with a focus on cloud (AWS) infrastructure for all new development.
- Application Monitoring Support
- Deploy the needed synthetic monitoring and alerting tools and configurations
- Support the development and customer experience teams by providing input to the priority of concerns and the collecting of information needed to correct issues
- Platform Design & Evangelism
- Develop processes, procedures & infrastructure to ensure platform setup is implemented correctly and configuration is backed up and reproducible
- Share and evangelize the platform details across all NavBlue teams
- Establish infrastructure standards (accounts, network config, firewalls, load balancers, TLS)
- Cloud Computing Leadership
- Understands costs associated with the platform sufficiently to provide recommendations for either short or long term platform changes which will improve profitability for the company
- People Leadership
- Support, develop and mentor direct reports
- Responsible for training and education of new staff and recurrent training of existing staff
- Complete any needed HR duties for the management of staff
- Reports, audits and documentation
- Establish and maintain best practices, documentation and procedures
- Development and tracking of management dashboards and reports
- Establish and track metrics and KPIs
- Establishing and tracking ISO910x level processes and adherence.
- Point of contact and responsible for support audits, with support from the HO SRM Technology.
- Budget management
- FINOPS
- Relationship management
- Manage suppliers and vendor relationships.
- Review and provide feedback on contracts as relate to hosting perimeter
- Manage the RFP responses for hosting perimeter
- Represent hosting services to external clients as necessary.

**Experience**:

- 5+ years management experience in a team environment, capable of building, developing and motivating a team to ensure engagement, retention, career development and execution
- Solid understanding of computer networks and security
- 5+ years of experience managing production environments within an organization, capable of understanding its impact on stakeholders to help drive necessary improvements to support them
- Experience with problem and incident management; capable of building processes & procedures to enable support organizations to identify and correct problem as they arise

**Knowledge, Skills, Demonstrated Capabilities & Competencies**:

- High level of organization, sufficient to enable the Team to effectively execute tasks
- Excellent problem-solving skills
- Experience in the aviation industry an asset
- Ability to drive for results
- Self-starter who can drive future platform changes
- Experience communicating effectively with vendors of services
- Deals well with ambiguous/undefined problems; ability to think abstractly

**Communication Skills**:

- Effective interpersonal skills
- Ability to communicate needs to Sr. Mgmt, explaining business value
- Ability to communicate corporate goals to staff to ensure clarity of focus, scope and deliverable, and sustain motivation
- Excellent communication skills (both written & verbal)
- Ability to effectively articulate technical challenges and solutions based on the audience

**Technical Systems Proficiency**:

- Cloud platform experience
- AWS platform experience an asset

**Travel Required**:

- 5% Domestic and International

**Qualifications**:

- Bachelor’s degree in a technical discipline or equivalent experience

**Perks**:
Located in the heart of University of Waterloo’s David Johnston Research + Technology Park, NAVBLUE is close to shops, restaurants, gyms, daycare, and many other amenities, and only 10 minutes away from Hwy 85. The modern design



  • Waterloo, Canada Procom Full time

    ```html Site Reliability Engineer Procom is seeking a Site Reliability Engineer for a contract role with one of our clients in the financial sector. Site Reliability Engineer Job Details: As an Automation Developer, you will be responsible for delivering automated solutions to complex problems. Site Reliability Engineer Responsibilities: Design, develop and...


  • Waterloo, Canada Procom Full time

    ```html Site Reliability Engineer Procom is seeking a Site Reliability Engineer for a contract role with one of our clients in the financial sector. Site Reliability Engineer Job Details: As an Automation Developer, you will be responsible for delivering automated solutions to complex problems. Site Reliability Engineer Responsibilities: Design, develop and...


  • Waterloo, Canada Procom Full time

    Site Reliability Engineer Procom is seeking a Site Reliability Engineer for a contract role with one of our clients in the financial sector. Job Details: As a Site Reliability Engineer, you will be responsible for delivering automated solutions to complex problems. Responsibilities: Design, develop and support technical solutions that automate agent...


  • Waterloo, Canada Procom Full time

    Site Reliability Engineer Procom is seeking a Site Reliability Engineer for a contract role with one of our clients in the financial sector. Job Details: As a Site Reliability Engineer, you will be responsible for delivering automated solutions to complex problems. Responsibilities: Design, develop and support technical solutions that automate agent...


  • Waterloo, Canada Open Text Corporation Full time

    **Site Reliability Administrator**: - Req id: 35055- Waterloo, ON, CA Richmond Hill, ON, CA Mississauga, ON, CA**OPENTEXT - THE INFORMATION COMPANY** As the Information Company, our mission at OpenText is to create software solutions and deliver services that redefine the future of digital. Be part of a winning team that leads the way in Enterprise...


  • Waterloo, Canada NAVBLUE Full time

    FunctionCustomer Fulfillment - Location- Waterloo, Ontario - Reference- Contract typePermanent - Working timeFull Time - Please note this position is known internally as Head of Site Reliability Management **Job Summary**: NAVBLUE, an Airbus Company, is currently seeking a Head of SRM to join our growing team. This position is responsible for providing...


  • Waterloo, Canada opentext Full time

    **OPENTEXT - THE INFORMATION COMPANY** Together Carbonite and Webroot form the SMB and Consumer Division of OpenText. The mission of our joint offering is to make cyber resilience simple, reliable and accessible in the connected world. We foster a thriving, dynamic environment rich with inventive minds and entrepreneurial spirit and our employees are...


  • Waterloo, Ontario, M2L, City of Toronto, Canada Procom Full time

    ```html Site Reliability Engineer Procom is seeking a Site Reliability Engineer for a contract role with one of our clients in the financial sector. Site Reliability Engineer Job Details: As an Automation Developer, you will be responsible for delivering automated solutions to complex problems. Site Reliability Engineer Responsibilities: Design, develop and...


  • Waterloo, Canada Google Inc. Full time

    Software Developer Manager II, Site Reliability EngineeringLocation: Waterloo, ON, CanadaMinimum Requirements:Bachelor’s degree in Computer Science, a related field, or equivalent practical experience.8 years of experience with data structures or algorithms.5 years of experience with software development in one or more programming languages.3 years of...


  • Waterloo, Canada Google Inc. Full time

    Software Developer Manager II, Site Reliability EngineeringLocation: Waterloo, ON, CanadaMinimum Requirements:Bachelor’s degree in Computer Science, a related field, or equivalent practical experience.8 years of experience with data structures or algorithms.5 years of experience with software development in one or more programming languages.3 years of...


  • Waterloo, Canada Google Inc. Full time

    Software Developer Manager II, Site Reliability EngineeringLocation: Waterloo, ON, CanadaMinimum Requirements:Bachelor’s degree in Computer Science, a related field, or equivalent practical experience.8 years of experience with data structures or algorithms.5 years of experience with software development in one or more programming languages.3 years of...


  • Waterloo, Canada opentext Full time

    **OPENTEXT - THE INFORMATION COMPANY** As the Information Company, our mission at OpenText is to create software solutions and deliver services that redefine the future of digital. Be part of a winning team that leads the way in Enterprise Information Management. **The Opportunity** The role Cloud Application Engineer/Site Reliability Engineer is to build...


  • Waterloo, Canada Google Full time

    Minimum qualifications: Bachelor’s degree in Computer Science, a related field, or equivalent practical experience. 8 years of experience with data structures or algorithms. 5 years of experience with software development in one or more programming languages. 3 years of experience managing people or teams, leading projects, and designing, analyzing, and...


  • Waterloo, Canada Google Full time

    Minimum qualifications: Bachelor’s degree in Computer Science, a related field, or equivalent practical experience. 8 years of experience with data structures or algorithms. 5 years of experience with software development in one or more programming languages. 3 years of experience managing people or teams, leading projects, and designing, analyzing, and...


  • Waterloo, Canada Google Full time

    Minimum qualifications: Bachelor’s degree in Computer Science, a related field, or equivalent practical experience. 8 years of experience with data structures or algorithms. 5 years of experience with software development in one or more programming languages. 3 years of experience managing people or teams, leading projects, and designing, analyzing, and...


  • Waterloo, Canada Google Inc. Full time

    Software Developer Manager II, Site Reliability Engineering link Copy link corporate_fare Google place Waterloo, ON, Canada Advanced Experience owning outcomes and decision making, solving ambiguous problems and influencing stakeholders;deep expertise in domain. Apply link Copy link Bachelor’s degree in Computer Science, a related...


  • Waterloo, Canada Google Inc. Full time

    Software Developer Manager II, Site Reliability Engineering link Copy link corporate_fare Google place Waterloo, ON, Canada Advanced Experience owning outcomes and decision making, solving ambiguous problems and influencing stakeholders;deep expertise in domain. Apply link Copy link Bachelor’s degree in Computer Science, a related...


  • Waterloo, Canada Google Inc. Full time

    Software Developer Manager II, Site Reliability Engineering link Copy link corporate_fare Google place Waterloo, ON, Canada Advanced Experience owning outcomes and decision making, solving ambiguous problems and influencing stakeholders;deep expertise in domain. Apply link Copy link Bachelor’s degree in Computer Science, a related...


  • Waterloo, Ontario, M2L, City of Toronto, Canada Procom Full time

    Site Reliability Engineer Procom is seeking a Site Reliability Engineer for a contract role with one of our clients in the financial sector. Job Details: As a Site Reliability Engineer, you will be responsible for delivering automated solutions to complex problems. Responsibilities: Design, develop and support technical solutions that automate agent...


  • Waterloo, Canada Open Text Corporation Full time

    **Lead Site Reliability Administrator**: - Req id: 38426- Waterloo, ON, CA Mississauga, ON, CA Richmond Hill, ON, CA**OPENTEXT - THE INFORMATION COMPANY** As the Information Company, our mission at OpenText is to create software solutions and deliver services that redefine the future of digital. Be part of a winning team that leads the way in Enterprise...