Site Reliability Engineer

4 weeks ago


Toronto, Ontario, Canada Bold Commerce Full time
Salary:

Who is Bold Commerce?

Bold Commerce powers personalized checkout experiences for leading omnichannel retailers and direct-to-consumer brands.

As a leader in the composable commerce space, Bold makes checkout better, boosting profitability by enabling personalized, customer-specific checkout flows designed to increase the Checkout Power Trio of conversion, AOV, and LTV - not just conversion. Built with a composable & headless architecture, Bold Checkout fits with any commerce stack, making it easy to overcome platform limitations. Leading omnichannel retailers like Harry Rosen and Staples Canada trust their business with Bold Checkout.

Named one of Built In Austin's Best Places to Work, Canada's Top Employers for Young People, and Manitoba's Top Employers, we're a dynamic team that truly cares about building the future of ecommerce. We live by the BUILDERS Code, a shared set of practices, beliefs, and values that help shape this remote-first company.

Founded in 2012, with team members (Builders) located throughout Canada and the U.S., and backed by investors like OMERS Ventures, WhiteCap Venture Partners, and Round13 Capital, Bold is leading the way to a better, composable ecommerce future.

About the role

Bold is looking for a Site Reliability Engineer (SRE) to enhance the reliability, scalability, and performance of our software systems and infrastructure. You'll work closely with Engineering and IT Operations teams to design and maintain robust systems that meet our service-level objectives (SLOs) and drive value for our merchants.

What you'll do

  • Design and manage scalable, fault-tolerant infrastructure for SaaS services.
  • Develop and implement proactive monitoring, alerting, and incident response processes to address system issues.
  • Optimize system performance through capacity planning, load testing, and performance tuning.
  • Automate tasks and streamline deployments using configuration management and infrastructure-as-code practices.
  • Collaborate with development teams to ensure efficient software deployment and release management.
  • Conduct root cause analysis and post-incident reviews to drive continuous improvements.
  • Stay updated on best practices and emerging technologies in site reliability engineering.
  • Contribute to the architecture of monitoring and performance systems.
  • Train team members on tools and processes.
  • Balance feature development speed with adherence to SLOs.
  • Effectively manage project execution.

What we're looking for

  • Bachelor's degree in Computer Science, Engineering, or a related field.
  • 5+ years of experience as an SRE or in a similar SaaS/cloud-based role.
  • Expertise in Linux/Unix systems administration, shell scripting, and proficiency in at least one programming language (e.g., Python, Go, Ruby).
  • Experience with automation and configuration management tools (e.g., Ansible, Chef, Puppet, Terraform) and cloud platforms (AWS, Azure, GCP).
  • Proficient in containerization technologies like Docker and Kubernetes, along with a solid understanding of networking concepts.
  • Skilled in using monitoring and logging tools (e.g., Prometheus, Grafana, ELK Stack) and incident management systems.
  • Strong problem-solving abilities, with excellent prioritization and communication skills.
  • Proven ability to build trust and maintain strong relationships both internally and externally.
  • Flexible work hours, including occasional overnight maintenance and participation in an on-call rotation once a month.

remote work

  • Toronto, Ontario, Canada Northbridge Financial Corporation Full time

    Overview of the Senior Site Reliability Engineer Role at Northbridge Financial Corporation The Senior Site Reliability Engineer is responsible for the development and execution of Service Level Objectives (SLOs). This role involves managing complex service reliability solutions and processes, as well as mentoring and guiding junior SREs. Key...


  • Toronto, Ontario, Canada Northbridge Financial Corporation Full time

    Overview of the Senior Site Reliability Engineer Role at Northbridge Financial Corporation The Senior Site Reliability Engineer is responsible for the establishment and execution of Service Level Objectives (SLOs). This role involves managing complex service reliability solutions and processes, while also providing mentorship and guidance to junior...


  • Toronto, Ontario, Canada Lightspeed Restaurant Full time

    Lead Site Reliability Engineer at Lightspeed RestaurantWe are seeking a skilled Lead Site Reliability Engineer to become a vital part of our Lightspeed Restaurant team. Our mission is to create innovative software solutions that empower restaurants to enhance their operational efficiency and profitability.In the role of Lead Site Reliability Engineer, you...


  • Toronto, Ontario, Canada Northbridge Financial Corporation Full time

    Overview of the Senior Site Reliability Engineer Role at Northbridge Financial Corporation The Senior Site Reliability Engineer is responsible for the establishment and execution of Service Level Objectives (SLOs). This role involves managing service reliability solutions and processes of increasing intricacy, along with mentoring and guiding junior...


  • Toronto, Ontario, Canada CIRCLE Full time

    About Circle: Circle is a pioneering financial technology firm positioned at the forefront of the evolving digital economy, where value can traverse globally, almost instantaneously, and at a lower cost compared to traditional settlement systems. This innovative layer of the internet unveils extraordinary opportunities for transactions, commerce, and...


  • Toronto, Ontario, Canada CIRCLE Full time

    About Circle: Circle operates at the forefront of financial technology, revolutionizing the way value is exchanged globally. Our innovative platform enables transactions to occur swiftly and cost-effectively, paving the way for a new era in commerce and finance. We are dedicated to enhancing economic prosperity and promoting inclusivity through our...


  • Toronto, Ontario, Canada Lightspeed Full time

    Welcome to Lightspeed Are you exploring new career avenues? You may find an exciting opportunity here. We are seeking a Senior Site Reliability Engineer to enhance our operations at Lightspeed. Our team is dedicated to developing software solutions that empower merchants to expand their business effectively. In this role, you will be instrumental in...


  • Toronto, Ontario, Canada Lightspeed Full time

    Welcome to Lightspeed! Are you exploring new career paths or simply assessing the job market? You may find the opportunity you're looking for here. We are in search of a Senior Site Reliability Engineer to enhance our NuOrder by Lightspeed team in North America. NuORDER by Lightspeed develops innovative software solutions that empower merchants to...


  • Toronto, Ontario, Canada Lightspeed Full time

    Welcome to Lightspeed Are you exploring new career paths or simply surveying the job market? You may find an exciting opportunity here. We are in search of a Senior Site Reliability Engineer to enhance our NuOrder by Lightspeed division in North America. NuORDER by Lightspeed develops innovative software solutions aimed at empowering merchants to...


  • Toronto, Ontario, Canada CIRCLE Full time

    Circle operates at the forefront of financial technology, revolutionizing the way value is transferred across the globe. Our innovative infrastructure, including USDC, a blockchain-based dollar, empowers businesses and developers to leverage groundbreaking advancements in payments and commerce, ultimately enhancing global economic prosperity and inclusion. ...


  • Old Toronto, Ontario, Canada PagerDuty, Inc. Full time

    PagerDuty empowers diverse teams to perform essential tasks that drive business success through the PagerDuty Operations Cloud.We are in search of a Senior Site Reliability Engineer to become a vital member of our SRE-Platform team. In this capacity, you will play a crucial role in developing, sustaining, and scaling the Kubernetes infrastructure that...


  • Old Toronto, Ontario, Canada PagerDuty, Inc. Full time

    PagerDuty empowers diverse teams to execute essential tasks that drive business success through the PagerDuty Operations Cloud.We are looking for a Senior Site Reliability Engineer to become a vital member of our SRE-Platform team. In this capacity, you will play a significant role in developing, sustaining, and enhancing the Kubernetes infrastructure that...


  • Old Toronto, Ontario, Canada PagerDuty, Inc. Full time

    PagerDuty empowers diverse teams to drive essential operations that propel business growth through the PagerDuty Operations Cloud.We are in search of a Senior Site Reliability Engineer to become a vital member of our SRE-Platform team. In this capacity, you will play a crucial role in developing, sustaining, and enhancing the Kubernetes infrastructure that...


  • Toronto, Ontario, Canada Northbridge Financial Corporation Full time

    Join Northbridge Financial Corporation as a Site Reliability Engineering LeadThe Site Reliability Engineering Lead is essential in maintaining the dependability, efficiency, and accessibility of our primary insurance systems. Collaborating closely with both application and infrastructure teams, your focus will be on preventing incidents, managing...


  • Toronto, Ontario, Canada Thomson Reuters Full time

    Become a part of our dynamic team as a Senior Site Reliability EngineerWe are on the lookout for a seasoned Senior SRE to enhance our Service Reliability team. If you are enthusiastic about DevOps methodologies and the development of scalable, dependable, and secure services, this role is tailored for you.Role Overview:Apply site reliability engineering and...


  • Toronto, Ontario, Canada Thomson Reuters Full time

    Become a vital member of our team as a Senior Site Reliability EngineerWe are in search of a skilled Senior SRE to enhance our Service Reliability team. If you are enthusiastic about DevOps principles and the development of scalable, dependable, and secure services, this role is tailored for you.Role Overview:Apply site reliability engineering and DevOps...


  • Toronto, Ontario, Canada Sentry Full time $189,000 - $214,000

    About Sentry In a world where subpar software is prevalent, Sentry is dedicated to transforming the development landscape. Our mission is to empower developers to create superior software more efficiently, allowing us to rediscover the joy of technology. With over $217 million in funding and a community of 100,000+ organizations supporting our vision, we...


  • Toronto, Ontario, Canada Relay Financial Full time

    About Relay Financial:At Relay, we are revolutionizing the way businesses manage their finances. Traditional banking has often hindered growth for business owners, and we are committed to changing that narrative. Our platform is designed to be an all-in-one, collaborative solution for money management, tailored specifically for small to medium-sized...


  • Toronto, Ontario, Canada Index Exchange Full time

    About Index Exchange: We have been pioneers in the evolution of advertising technology, and we are in search of technical talent to help us define its future. Our clients face distinct challenges that can only be addressed at an internet scale, and that's where our team's technical capabilities truly shine. Our platform processes over 450 billion requests...


  • Old Toronto, Ontario, Canada Akamai Full time

    Are you driven by the desire to enhance operational processes? Do you thrive in a multicultural team of engineering professionals? Join our elite Site Reliability team at Akamai. We focus on designing, developing, and managing applications and infrastructure that underpin Akamai's Compute offerings. Our expertise lies in creating and sustaining rapid,...