System Reliability Engineering Lead

3 days ago


Old Toronto, Canada CGI Full time

Position Description:

We are Canada's largest independent information technology services firm, and after 40 years, we're still growing Innovation, technology, and service delivery are our focus. Our goal is to ensure our clients remain ahead of the competition. We provide a full spectrum of managed services from IT and business process outsourcing to systems integration and consulting that are transforming our clients’ operations and helping them to succeed.

Do you enjoy working with a highly motivated and talented team to deliver mission critical developer tooling? We are currently expanding our System Reliability Engineering team that helps one of our key clients deploy, manage, troubleshoot, and enhance their developer tooling platform, servicing over developers.

Your future duties and responsibilities:

  1. As a System Reliability Engineer, you will be responsible for designing, implementing, and supporting a verity of developer productivity tools that include Ansible Tower, GitLab, Artifactory and SonarQube.
  2. The technology stack used to manage the platform includes Ansible, Terraform, Python, Prometheus, Splunk, and ELK.
  3. You will build automation solutions to provision and validate infrastructure and help debug and resolve problems.
  4. You will help to improve operational performance by focusing on user experience, effectively assessing and managing risk, and minimizing the impact of failures.

Required qualifications to be successful in this role:

  • Keeping all components of the developer productivity platform up and running
  • Working closely with internal partners and platform users to ensure that all services meet security, SLA, and performance requirements
  • Writing, updating, and using documentation, including runbooks and playbooks
  • Automating infrastructure deployment, testing, application failover, failure mitigation, user self-service functions, and more
  • Debugging complex problems across the entire stack
  • Participating in various meetings with the Operations and Delivery teams
  • Lead Daily/Weekly Meetings to discuss the overall health of the systems
  • Leading Root Cause Analysis calls
  • Propose and implement Monitoring Improvements/Optimization and Automation Opportunities
  • Take part in PI (Program Increment) Planning sessions

Key Skills and Attributes

  • 10 years experience with software engineering, software development, or system operations
  • Experience working with Linux and can write shell scripts and understands Linux internals and performance tuning
  • Strong understanding of networking principles
  • Experience debugging large scale complex systems in production
  • Experience in building, implementing, and supporting highly available production systems
  • Experience automating infrastructure and deployments using Terraform, Ansible, and Python or equivalent technologies
  • Understanding of DevOps engineering, CI/CD, and software deployment
  • Working knowledge of developer tooling such as Artifactory, GitLab, SonarQube, and Ansible Tower
  • Experience with various monitoring and observability tools
  • Experience deploying and managing workloads on one of the major public cloud platforms, private clouds such as OpenStack
  • Experience deploying and managing workloads on one of the major container management platforms like Kubernetes, OpenShift, PCF or Rancher
  • A curiosity about how complex socio-technical systems operate and what happens during failure

It’s not expected that any single candidate would have experience across all these areas – we are looking for someone who is strong in a few areas and has interest and curiosity in others.

#LI-SH1

Skills:

DevOps Engineering, GitHub, OpenShift, Linux

#J-18808-Ljbffr

  • Old Toronto, Canada CGI Full time

    Position Description: We are Canada's largest independent information technology services firm, and after 40 years, we're still growing! Innovation, technology, and service delivery are our focus. Our goal is to ensure our clients remain ahead of the competition. We provide a full spectrum of managed services from IT and business process outsourcing to...


  • Old Toronto, Canada CGI Full time

    Position Description: We are Canada's largest independent information technology services firm, and after 40 years, we're still growing! Innovation, technology, and service delivery are our focus. Our goal is to ensure our clients remain ahead of the competition. We provide a full spectrum of managed services from IT and business process outsourcing to...


  • Old Toronto, Canada CGI Full time

    Position Description: We are Canada's largest independent information technology services firm, and after 40 years, we're still growing! Innovation, technology, and service delivery are our focus. Our goal is to ensure our clients remain ahead of the competition. We provide a full spectrum of managed services from IT and business process outsourcing...


  • Old Toronto, Canada Scotiabank Full time

    Press Tab to Move to Skip to Content Link Select how often (in days) to receive an alert: Select how often (in days) to receive an alert: Please be advised that our Careers site will be unavailable from November 28 at 12am ET to November 29 12am ET for scheduled system maintenance. Title: System Reliability Engineer Requisition ID:...


  • Old Toronto, Ontario, Canada Scotiabank Full time

    Press Tab to Move to Skip to Content Link Select how often (in days) to receive an alert: Select how often (in days) to receive an alert: Please be advised that our Careers site will be unavailable from November 28 at 12am ET to November 29 12am ET for scheduled system maintenance. Title: System Reliability Engineer Requisition ID: 199079Join a purpose...


  • Old Toronto, Canada Scotiabank Full time

    Press Tab to Move to Skip to Content Link Select how often (in days) to receive an alert: Select how often (in days) to receive an alert: Please be advised that our Careers site will be unavailable from November 28 at 12am ET to November 29 12am ET for scheduled system maintenance. Title: System Reliability Engineer Requisition ID:...


  • Old Toronto, Canada Scotiabank Full time

    Press Tab to Move to Skip to Content Link Select how often (in days) to receive an alert: Select how often (in days) to receive an alert: Please be advised that our Careers site will be unavailable from November 28 at 12am ET to November 29 12am ET for scheduled system maintenance. Title: System Reliability Engineer Requisition ID:...


  • Old Toronto, Canada Scotiabank Full time

    Requisition ID: 199079 Join a purpose driven winning team, committed to results, in an inclusive and high-performing culture. Are you passionate about reliability engineering and employ best practices to ensure the availability of applications in production? Combining aspects from DevOps, SysAdmin, and Test Engineer, the role of Site Reliability Engineer...


  • Old Toronto, Canada Scotiabank Full time

    Requisition ID: 199079 Join a purpose driven winning team, committed to results, in an inclusive and high-performing culture. Are you passionate about reliability engineering and employ best practices to ensure the availability of applications in production? Combining aspects from DevOps, SysAdmin, and Test Engineer, the role of Site Reliability Engineer...


  • Old Toronto, Canada Scotiabank Full time

    Are you passionate about reliability engineering and employ best practices to ensure the availability of applications in production? Combining aspects from DevOps, SysAdmin, and Test Engineer, the role of Site Reliability Engineer will allow you to have the opportunity to combine your technical ability, strategic thinking and provide detail-oriented...


  • Old Toronto, Canada Scotiabank Full time

    Are you passionate about reliability engineering and employ best practices to ensure the availability of applications in production? Combining aspects from DevOps, SysAdmin, and Test Engineer, the role of Site Reliability Engineer will allow you to have the opportunity to combine your technical ability, strategic thinking and provide detail-oriented...


  • Old Toronto, Canada Scotiabank Full time

    Is this role right for you? Monitor critical platforms in the bank that are expected to run hundreds of applications Improve and maintain site availability, scalability, service, and system performance Investigate system errors and problems, bottleneck analysis of the system at scale, etc. Setup monitoring systems and application metrics as well as...


  • Old Toronto, Canada Scotiabank Full time

    Is this role right for you? Monitor critical platforms in the bank that are expected to run hundreds of applications Improve and maintain site availability, scalability, service, and system performance Investigate system errors and problems, bottleneck analysis of the system at scale, etc. Setup monitoring systems and application metrics as well as...


  • Toronto, Canada CGI Full time

    Position Description: We are Canada's largest independent information technology services firm, and after 40 years, we're still growing! Innovation, technology, and service delivery are our focus. Our goal is to ensure our clients remain ahead of the competition. We provide a full spectrum of managed services from IT and business process outsourcing...


  • Toronto, Canada CGI Full time

    Position Description: We are Canada's largest independent information technology services firm, and after 40 years, we're still growing! Innovation, technology, and service delivery are our focus. Our goal is to ensure our clients remain ahead of the competition. We provide a full spectrum of managed services from IT and business process outsourcing...


  • Toronto, Canada Scotiabank Full time

    Requisition ID: 199079 Join a purpose driven winning team, committed to results, in an inclusive and high-performing culture. Are you passionate about reliability engineering and employ best practices to ensure the availability of applications in production? Combining aspects from DevOps, SysAdmin, and Test Engineer, the role of Site Reliability...


  • Toronto, Canada Scotiabank Full time

    Requisition ID: 199079 Join a purpose driven winning team, committed to results, in an inclusive and high-performing culture. Are you passionate about reliability engineering and employ best practices to ensure the availability of applications in production? Combining aspects from DevOps, SysAdmin, and Test Engineer, the role of Site Reliability...


  • Toronto, Canada Scotiabank Full time

       Requisition ID: 199079Join a purpose driven winning team, committed to results, in an inclusive and high-performing culture. Are you passionate about reliability engineering and employ best practices to ensure the availability of applications in production?Combining aspects from DevOps, SysAdmin, and Test Engineer, the role of Site Reliability...


  • Toronto, Ontario, Canada Scotiabank Full time

    Requisition ID: 199079 Join a purpose driven winning team, committed to results, in an inclusive and high-performing culture. Are you passionate about reliability engineering and employ best practices to ensure the availability of applications in production? Combining aspects from DevOps, SysAdmin, and Test Engineer, the role of Site Reliability Engineer...


  • Toronto, Ontario, Canada Scotiabank Full time

    Are you passionate about reliability engineering and employ best practices to ensure the availability of applications in production? Combining aspects from DevOps, SysAdmin, and Test Engineer, the role of Site Reliability Engineer will allow you to have the opportunity to combine your technical ability, strategic thinking and provide detail-oriented...