Senior Site Reliability Engineer- Fleet

2 weeks ago


Toronto, Ontario, Canada Cisco Systems, Inc. Full time

Cisco Meraki, a division of Cisco Networking, is a cloud-managed IT company and leader in cloud-controlled Wi-Fi, routing, and security. Our intuitive platform enables organizations of all sizes to deliver customer and employee experiences at scale. To provide best-in-class technologies to our customers, we've created an unrivaled company culture for our employees. One where diverse backgrounds, perspectives, and experiences shape our work and fuel our evolution. One that is collaborative, flexible, and inclusive and provides employees with the autonomy to develop technology that's accessible and secure for everyone.

We are seeking a Senior Site Reliability Engineer (SRE) to join our dynamic SRE Fleet team, which is responsible for ensuring the stability, scalability, and efficiency of our infrastructure. You will play a critical role in maintaining and improving a fleet of over 2000+ machines across a global cloud environment. This role is highly collaborative, involving close interaction with engineering and SRE teams in the UK and San Francisco to scale and optimize our infrastructure.

Responsibilities
  • Develop and maintain automation code for cloud maintenance processes using Ansible and Ruby.
  • Debug and resolve complex failure scenarios across large-scale systems, ensuring high availability and reliability.
  • Design, implement, and optimize GitLab CI pipelines to streamline deployment and testing workflows.
  • Collaborate with engineering teams to identify and address performance bottlenecks and scaling challenges.
  • Proactively troubleshoot issues across the fleet, using a deep understanding of Linux systems and networking.
  • Contribute to the creation of robust unit tests and infrastructure testing suites with RSpec.
  • Participate in collaborative projects to improve infrastructure efficiency, scalability, and observability.
  • Work cross-functionally with teams in different time zones, fostering a culture of shared ownership and reliability.
  • Develop and maintain automated tools for collecting infrastructure data to support compliance requirements.
  • Streamline compliance processes by reducing manual overhead through automation.
You are an ideal candidate if you:
  • 5+ years of experience in Site Reliability Engineering, DevOps, or a similar role in large-scale cloud environments.
  • Strong expertise in:
  • Ansible for infrastructure automation.
  • Ruby programming and testing frameworks like RSpec.
  • Linux systems administration and troubleshooting.
  • CI/CD pipelines, particularly GitLab CI.
  • Demonstrated experience troubleshooting and debugging in complex distributed systems.
  • Experience managing and optimizing fleets of thousands of machines.
  • Excellent collaboration skills and the ability to work effectively across teams in multiple time zones.
  • Passion for automation, scalability, and infrastructure as code.
  • Familiarity with cloud providers (AWS, GCP, or similar).
  • Knowledge of monitoring and observability tools.
  • Experience with disaster recovery and high availability strategies.

At Cisco Meraki, we're challenging the status quo with the power of diversity, inclusion, and collaboration. When we connect different perspectives, we can imagine new possibilities, inspire innovation, and release the full potential of our people. We're building an employee experience that includes appreciation, belonging, growth, and purpose for everyone.

Sign up to receive notifications of similar jobs
#J-18808-Ljbffr

  • Toronto, Ontario, Canada Northbridge Financial Full time

    What is it like to be a senior Site Reliability Engineer at Northbridge FinancialThe Senior Site Reliability Engineer oversees the creation and implementation of Service Level Objectives (SLOs). The Senior SRE handles service reliability solutions and processes of increasing complexity, and is responsible for mentoring and leading less experienced SREs.We...


  • Toronto, Ontario, Canada Northbridge Financial Full time

    What is it like to be a senior Site Reliability Engineer at Northbridge FinancialThe Senior Site Reliability Engineer oversees the creation and implementation of Service Level Objectives (SLOs). The Senior SRE handles service reliability solutions and processes of increasing complexity, and is responsible for mentoring and leading less experienced SREs.We...


  • Toronto, Ontario, Canada Randstad Digital Full time

    Senior Site Reliability Engineer - Establish and SRE Practice (Contract Position)Number of Positions: 1 Filled: 0 Duration: 6 monthsLocation: Toronto, ON, CAMust be eligible to work in CanadaHybrid position, 2-3d/month onsite in Toronto mandatoryRoles and responsibilities:The consultant will be building and SRE practice from the ground up. He/she would have...


  • Toronto, Ontario, Canada Randstad Digital Full time

    Senior Site Reliability Engineer - Establish and SRE Practice (Contract Position) Number of Positions: 1 Filled: 0 Duration: 6 months Location: Toronto, ON, CA Must be eligible to work in Canada Hybrid position, 2-3d/month onsite in Toronto mandatory Roles and responsibilities: The consultant will be building and SRE practice from the ground up....


  • Toronto, Ontario, Canada Gotvantage Full time

    Are you passionate about ensuring the seamless operation of large-scale, distributed, and robust systems? Do you thrive on optimizing performance, increasing reliability, and automating tasks to create more efficient processes? Are you hungry for learning? If so, we would want to chat with youAs a Senior Site Reliability Engineer (SRE) / DevOps Engineer at...


  • Toronto, Ontario, Canada Black Ties Group Inc. Full time

    We are looking for a Senior Site Reliability Engineer to join our growing Platform Infrastructure group, Site Reliability Engineering team Reporting to the Engineering Manager - Infrastructure, you'll apply your technical and domain expertise to solve complex technical and business challenges; respond to and assist with production incidents in collaboration...


  • Toronto, Ontario, Canada Thomson Reuters Full time

    Senior Site Reliability Engineer, ONESOURCE Indirect TaxThomson Reuters ONESOURCE Indirect Tax's SRE team is looking for a Senior Site Reliability Engineer who will provide hands-on technical skills and share industry best practices with other team members on core SRE principles and tools. The Site Reliability Engineer will participate in end-to-end...


  • Toronto, Ontario, Canada RBC Full time

    Job SummaryJob DescriptionWhat is the Opportunity?RBC Insurance Technology is seeking to hire a Senior Site Reliability Engineer for its Insurance Technology Platform Support team. The Insurance Technology Platform Support Team is a specialized unit dedicated to ensuring the optimal performance, availability, and resilience of IT applications used in the...


  • Toronto, Ontario, Canada RBC Full time

    Job SummaryJob DescriptionWhat is the Opportunity?RBC Insurance Technology is seeking to hire a Senior Site Reliability Engineer for its Insurance Technology Platform Support team. The Insurance Technology Platform Support Team is a specialized unit dedicated to ensuring the optimal performance, availability, and resilience of IT applications used in the...


  • Toronto, Ontario, Canada Stacktics Inc. Full time

    As a Senior Site Reliability Engineer (GCP) you will play a key role at Stacktics Inc., where we design, create, deploy, maintain and grow industry-leading Cloud Infrastructure, Big Data Analytics and Cloud For Marketing products, solutions and services.As a senior DevOps team member, you will play an integral role in designing, optimizing, documenting, and...


  • Toronto, Ontario, Canada Stacktics Inc. Full time

    As a Senior Site Reliability Engineer (GCP) you will play a key role at Stacktics Inc., where we design, create, deploy, maintain and grow industry-leading Cloud Infrastructure, Big Data Analytics and Cloud For Marketing products, solutions and services.As a senior DevOps team member, you will play an integral role in designing, optimizing, documenting, and...


  • Toronto, Ontario, Canada Thomson Reuters Full time

    Description Thomson Reuters is seeking a Senior Site Reliability Engineer to join our Service Management, Technology team. This role calls for an individual who is capable of analyzing customer problems of high complexity and assessing the scope of impact, while mitigating customer impact of issues and executing work arounds. Willingness to learn is an...


  • Toronto, Ontario, Canada LanceSoft Inc Full time

    Business group: Client Engineering - Mobile and Web - Digital Engineering Operations part of Digital Banking, supporting mobile and web developmentProject: SRE (Site Reliability Engineering) work for Scotia Digital projects – working on online and mobile banking; back-end development; maintain reliability of applications and production...


  • Toronto, Ontario, Canada LanceSoft Inc Full time

    Description:Business group: Client Engineering - Mobile and Web - Digital Engineering Operations part of Digital Banking, supporting mobile and web developmentProject: SRE (Site Reliability Engineering) work for Scotia Digital projects – working on online and mobile banking; back-end development; maintain reliability of applications and production...


  • Toronto, Ontario, Canada LanceSoft Inc Full time

    Description:Business group: Client Engineering - Mobile and Web - Digital Engineering Operations part of Digital Banking, supporting mobile and web developmentProject: SRE (Site Reliability Engineering) work for Scotia Digital projects – working on online and mobile banking; back-end development; maintain reliability of applications and production...


  • Toronto, Ontario, Canada Scotiabank Full time

    Job DescriptionThis is a senior role responsible for ensuring the reliability, scalability, and performance of critical applications and infrastructure at Scotiabank.The ideal candidate will have strong knowledge of Site Reliability Engineering practices, expertise in DevOps, and experience developing and supporting large-scale on-premises systems and...


  • Toronto, Ontario, Canada TechAlliance of Southwestern Ontario, London Economic Development Corporation Full time

    About TechAlliance of Southwestern Ontario, London Economic Development CorporationWe are a leading organization in the field of technology and economic development, dedicated to driving innovation and growth in the region.Job SummaryThe Senior Technical Lead - Site Reliability & Operations will lead our site reliability and operations efforts, requiring a...


  • Toronto, Ontario, Canada LivePerson Full time

    LivePerson (NASDAQ: LPSN) is the global leader in enterprise conversations. Hundreds of the world's leading brands — including HSBC, Chipotle, and Virgin Media — use our award-winning Conversational Cloud platform to connect with millions of consumers. We power nearly a billion conversational interactions every month, providing a uniquely rich data set...


  • Toronto, Ontario, Canada LivePerson Full time

    LivePerson (NASDAQ: LPSN) is the global leader in enterprise conversations. Hundreds of the world's leading brands — including HSBC, Chipotle, and Virgin Media — use our award-winning Conversational Cloud platform to connect with millions of consumers. We power nearly a billion conversational interactions every month, providing a uniquely rich data set...


  • Toronto, Ontario, Canada TechAlliance of Southwestern Ontario, London Economic Development Corporation Full time

    About This Role:This position plays a key role in leading site reliability and operations for Integration platforms. As a Senior Technical Lead - Site Reliability & Operations, you will oversee the availability, reliability, security, and sustainability of the platform. Your primary responsibilities include collaborating with product management and...