Senior Site Reliability Engineer

3 weeks ago


Toronto, Ontario, Canada Lightspeed Full time

Welcome to Lightspeed

Are you exploring new career avenues? You may find an exciting opportunity here.

We are seeking a Senior Site Reliability Engineer to enhance our operations at Lightspeed. Our team is dedicated to developing software solutions that empower merchants to expand their business effectively. In this role, you will be instrumental in addressing essential areas such as cloud infrastructure, reliability, incident management, data analytics, and operational efficiency. Your expertise will support our development teams by providing the infrastructure and tools necessary for scalable growth. You will play a key role in constructing and maintaining multi-region infrastructures and networks, ensuring our products operate reliably, efficiently, and securely by implementing and promoting established DevOps principles.

Key Responsibilities:

  • Collaborate with development teams to equip them with the tools and methodologies for monitoring software performance in production, establishing and tracking reliability metrics (SLI, SLO), and managing error budgets.
  • Architect, build, and sustain resilient infrastructure utilizing GCP and cloud-native technologies like GKE, Cloud SQL, and BigQuery.
  • Create and oversee CI/CD pipelines for streamlined deployment and release processes using various technologies (GitLab, GitHub, Helm, Terraform, etc.).
  • Lead incident management initiatives and perform post-incident analyses to mitigate future disruptions.
  • Guide junior SREs and developers, sharing best practices in cloud architecture, data management, and software development.
  • Conduct performance benchmarks and implement improvements to enhance system reliability and throughput.
  • Work with cross-functional teams to identify, design, and execute internal process enhancements efficiently.
  • Design and construct robust, scalable, and highly available systems.
  • Develop platform solutions and apply software engineering principles to bolster software reliability and expedite delivery.
  • Manage infrastructure changes through Infrastructure as Code (IaC).
  • Participate in the on-call rotation.
  • Stay updated with industry trends and emerging technologies, advocating for the adoption of innovations that enhance product quality and team efficiency.

Qualifications:

  • Bachelor's degree in Computer Science, Engineering, or equivalent practical experience.
  • 7-9+ years of experience in site reliability engineering, systems administration, or software engineering.
  • Strong proficiency in container orchestration platforms, particularly Kubernetes.
  • Solid understanding of relational (e.g., PostgreSQL, MySQL) and NoSQL databases (e.g., MongoDB, Cassandra, Redis).
  • In-depth knowledge of network protocols and IP networking, along with experience in network troubleshooting.
  • Proficiency in programming languages such as Java, Python, or Go.
  • Proven experience managing large-scale infrastructure in cloud environments like Google Cloud, AWS, or Azure.
  • Familiarity with monitoring tools (e.g., Prometheus, Grafana, Datadog) and logging solutions (e.g., ELK stack).
  • Strong grasp of security best practices.
  • Exceptional problem-solving abilities and the capacity to work under pressure to resolve complex issues.
  • Excellent communication skills for effective collaboration with cross-functional teams.
  • Strong leadership capabilities, with the ability to guide projects and influence engineering decisions across the organization.

We recognize that individuals are more than just their resumes. If you feel uncertain about your fit for this role, we encourage you to apply.

Benefits:

Experience the Lightspeed culture...

  • Flexible working environment;
  • Genuine career advancement opportunities in a rapidly growing company;
  • Work within a team that is large enough for growth yet small enough to make a significant impact.

... and enjoy a comprehensive benefits package designed to keep you happy, healthy, and fulfilled:

  • Lightspeed share scheme;
  • Lightspeed RSU program;
  • Unlimited paid time off;
  • Flexible working policy;
  • Health insurance;
  • Health and wellness benefits;
  • Paid parental leave;
  • LinkedIn Learning access;
  • Volunteer day.

#LI-AL2



  • Toronto, Ontario, Canada Lightspeed Full time

    Welcome to Lightspeed! Are you exploring new career paths or simply assessing the job market? You may find the opportunity you're looking for here. We are in search of a Senior Site Reliability Engineer to enhance our NuOrder by Lightspeed team in North America. NuORDER by Lightspeed develops innovative software solutions that empower merchants to...


  • Toronto, Ontario, Canada Lightspeed Full time

    Welcome to Lightspeed Are you exploring new career paths or simply surveying the job market? You may find an exciting opportunity here. We are in search of a Senior Site Reliability Engineer to enhance our NuOrder by Lightspeed division in North America. NuORDER by Lightspeed develops innovative software solutions aimed at empowering merchants to...


  • Toronto, Ontario, Canada Behavox Full time

    About the PositionThe Behavox Platform is a robust, resilient, and high-performance system designed for the storage and processing of extensive data sets. We provide a comprehensive suite of APIs that facilitate the development of solutions enabling clients to effectively manage and analyze large volumes of information. As a Senior Site Reliability Engineer,...


  • Toronto, Ontario, Canada CIRCLE Full time

    About Circle: Circle is a pioneering financial technology firm positioned at the forefront of the evolving digital economy, where value can traverse globally, almost instantaneously, and at a lower cost compared to traditional settlement systems. This innovative layer of the internet unveils extraordinary opportunities for transactions, commerce, and...


  • Toronto, Ontario, Canada Northbridge Financial Corporation Full time

    Overview of the Senior Site Reliability Engineer Role at Northbridge Financial Corporation The Senior Site Reliability Engineer is responsible for the development and execution of Service Level Objectives (SLOs). This role involves managing complex service reliability solutions and processes, as well as mentoring and guiding junior SREs. Key...


  • Toronto, Ontario, Canada Northbridge Financial Corporation Full time

    Overview of the Senior Site Reliability Engineer Role at Northbridge Financial Corporation The Senior Site Reliability Engineer is responsible for the establishment and execution of Service Level Objectives (SLOs). This role involves managing complex service reliability solutions and processes, while also providing mentorship and guidance to junior...


  • Toronto, Ontario, Canada Thomson Reuters Full time

    About the Opportunity In this position as Senior Site Reliability Engineering Manager, you will: Leadership and Mentorship: Inspire and guide a team of Site Reliability Engineers, offering technical expertise, coaching, and support to cultivate a collaborative, innovative, and continuously improving environment. Operational Excellence: Champion the...


  • Old Toronto, Ontario, Canada Akamai Full time

    Are you driven by the desire to enhance operational processes? Do you thrive in a multicultural team of engineering professionals? Join our elite Site Reliability team at Akamai. We focus on designing, developing, and managing applications and infrastructure that underpin Akamai's Compute offerings. Our expertise lies in creating and sustaining rapid,...


  • Toronto, Ontario, Canada Northbridge Financial Corporation Full time

    Overview of the Senior Site Reliability Engineer Role at Northbridge Financial Corporation The Senior Site Reliability Engineer is responsible for the establishment and execution of Service Level Objectives (SLOs). This role involves managing service reliability solutions and processes of increasing intricacy, along with mentoring and guiding junior...


  • Toronto, Ontario, Canada mccainfood Full time

    Job SummaryWe are seeking a highly skilled Senior Engineering Manager to lead our Site Reliability Engineering (SRE) and Observability team at McCain Foods. As a key member of our Global Technology department, you will be responsible for designing, implementing, and monitoring enterprise-grade secure fault-tolerant SRE and Observability infrastructure.Key...


  • Toronto, Ontario, Canada mccainfood Full time

    Job SummaryWe are seeking a highly skilled Senior Engineering Manager to lead our Site Reliability Engineering (SRE) and Observability team at McCain Foods. As a key member of our Global Technology department, you will be responsible for designing, implementing, and monitoring enterprise-grade secure fault-tolerant SRE and Observability infrastructure.Key...


  • Toronto, Ontario, Canada CIRCLE Full time

    About Circle: Circle operates at the forefront of financial technology, revolutionizing the way value is exchanged globally. Our innovative platform enables transactions to occur swiftly and cost-effectively, paving the way for a new era in commerce and finance. We are dedicated to enhancing economic prosperity and promoting inclusivity through our...


  • Toronto, Ontario, Canada Northbridge Financial Corporation Full time

    Job SummaryThe Senior Site Reliability Engineer at Northbridge Financial Corporation is responsible for overseeing the creation and implementation of Service Level Objectives (SLOs) to ensure the reliability and efficiency of our cloud-based solutions.Key ResponsibilitiesDesign, develop, test, and document advanced site reliability solutions within a complex...


  • Toronto, Ontario, Canada Northbridge Financial Corporation Full time

    Job SummaryThe Senior Site Reliability Engineer at Northbridge Financial Corporation is responsible for overseeing the creation and implementation of Service Level Objectives (SLOs) to ensure the reliability and efficiency of our cloud-based solutions.Key ResponsibilitiesDesign, develop, test, and document advanced site reliability solutions within a complex...


  • Toronto, Ontario, Canada Northbridge Financial Corporation Full time

    About the RoleThe Senior Site Reliability Engineer at Northbridge Financial Corporation is responsible for overseeing the creation and implementation of Service Level Objectives (SLOs) to ensure the reliability and efficiency of our cloud-based solutions.Key ResponsibilitiesDesign, develop, test, and document advanced site reliability solutions within a...


  • Toronto, Ontario, Canada Northbridge Financial Corporation Full time

    About the RoleThe Senior Site Reliability Engineer at Northbridge Financial Corporation is responsible for overseeing the creation and implementation of Service Level Objectives (SLOs) to ensure the reliability and efficiency of our cloud-based solutions.Key ResponsibilitiesDesign, develop, test, and document advanced site reliability solutions within a...


  • Toronto, Ontario, Canada Thomson Reuters Full time

    Overview of the Position In this role as Senior Site Reliability Engineering Manager, you will: Leadership and Mentorship: Inspire and guide a team of Site Reliability Engineers, offering technical direction, coaching, and support to cultivate a collaborative, innovative, and continuously improving environment. Excellence in Operations: Spearhead the...


  • Toronto, Ontario, Canada CIRCLE Full time

    Circle operates at the forefront of financial technology, revolutionizing the way value is transferred across the globe. Our innovative infrastructure, including USDC, a blockchain-based dollar, empowers businesses and developers to leverage groundbreaking advancements in payments and commerce, ultimately enhancing global economic prosperity and inclusion. ...


  • Toronto, Ontario, Canada Thomson Reuters Full time

    Become a part of our dynamic team as a Senior Site Reliability EngineerWe are on the lookout for a seasoned Senior SRE to enhance our Service Reliability team. If you are enthusiastic about DevOps methodologies and the development of scalable, dependable, and secure services, this role is tailored for you.Role Overview:Apply site reliability engineering and...


  • Toronto, Ontario, Canada Thomson Reuters Full time

    Become a vital member of our team as a Senior Site Reliability EngineerWe are in search of a skilled Senior SRE to enhance our Service Reliability team. If you are enthusiastic about DevOps principles and the development of scalable, dependable, and secure services, this role is tailored for you.Role Overview:Apply site reliability engineering and DevOps...