Senior Site Reliability Engineer

2 weeks ago

QC Canada Botpress, Inc. Full time

Help bring AI agents to companies worldwide. Over the next decade, autonomous agents will redefine how we work. Botpress allows companies to build and deploy advanced AI agents that move beyond conversation into real business logic. Our product works today and at scale, across industries, regions, and limitless use cases. As the 3rd fastest-growing B2B AI start-up worldwide, we’re at the forefront of the AI revolution, providing the most widely-used platform for sophisticated AI agents. The work ahead is ambitious. The opportunity is rare. We take a deliberate approach to growth: product-led, capital-efficient, and highly focused. If you want to build foundational technology for one of the most meaningful platform shifts in software, we’re looking for top talent to join us. Key Highlights: Over 1 million AI agents and chatbots deployed 700,000+ platform users Trusted by 35% of Fortune 500 companies 7 years of expertise in AI solutions About the Role We’re hiring a Site Reliability Engineer to help ensure the stability, scalability, and security of our platform. You’ll be part of the product team, owning the systems that keep our services resilient and performant under real-world loads. This is a hands-on engineering role focused on infrastructure reliability and operational excellence. You’ll architect and maintain the cloud systems (e.g. AWS) that power Botpress, with a strong focus on observability, uptime, and automation. You’ll collaborate closely with engineers to refine how we ship, monitor, and operate software — always with an eye toward reducing risk and improving speed. Part of this role will include opening up the site to different regions of users. Responsibilities Architect and maintain scalable infrastructure Design and optimize CI/CD pipelines to ensure smooth delivery of changes Improve observability through advanced monitoring, logging, and alerting Own incident response and support the engineering team in diagnosing and resolving issues Build systems that increase platform reliability, resiliency, and uptime Enforce security best practices across environments and workflows Manage infrastructure as code using tools like Terraform or Pulumi Document operational procedures, disaster recovery plans, and system runbooks 3+ years working with Typescript (Pulumi, React for Backstage, Cli tools) 5+ years in SRE, DevOps, or infrastructure engineering roles Deep experience with AWS cloud infrastructure and services (ECS, S3, Lambda, RDS) Comfortable with Linux systems, containerization, and orchestration (e.g. Docker, Kubernetes) Proficient in CI/CD tools, infrastructure-as-code, and automation scripting Familiar with incident management and site reliability principles Experience with observability stacks like Datadog, Grafana, Prometheus, etc. Strong communicator and collaborator across technical teams Calm and systematic under pressure when production issues arise Bonus: Previous experience in a fast-paced startup or SaaS environment About Botpress Botpress recently raised its $25 million Series B funding. As a fast-growing start-up, we run a lean and innovative ship that leans on AI for maximum business impact. At Botpress, everyone is an owner, bringing their unique perspective and talents. Our teams are talented and passionate. We intentionally hire individuals who are eager, passionate, talented, and hungry to learn and grow throughout their career. You’ll be on a team that's not just adapting to the AI revolution, but leading it. Joining our team means changing the future of enterprise AI and building technology that will define the next era of business automation. Work at one of Canada’s fastest-growing AI start-ups Work with a talented and passionate team 4 weeks of vacation Paid sick and parental leave Comprehensive health, dental, vision, travel, and life insurance Funding for education and skills improvement Fully-stocked fridge and cupboard – we take snacks seriously Your own desk – no ‘hot-desk’-style sign-up systems A vibrant office community, including weekly socials #J-18808-Ljbffr

Senior Site Reliability Engineer

3 weeks ago

, , Canada Thinkific Full time

Join to apply for the Senior Site Reliability Engineer role at Thinkific Join to apply for the Senior Site Reliability Engineer role at Thinkific Are you an experienced Site Reliability Engineer looking for a new challenge? We’re looking for a Senior Site Reliability Engineer to join us at Thinkific. We’re looking for a Senior Site Reliability Engineer...
Senior Site Reliability Engineer

5 days ago

, , Canada Akamai Technologies Full time

Senior Site Reliability Engineer Join Akamai Technologies as we build a reliable, secure, and scalable Internet. We are looking for a Senior Site Reliability Engineer to help us solve complex performance and reliability challenges. Job Description Are you passionate about cutting‑edge technology and ready to tackle some of the Internet’s most difficult...
Senior Site Reliability Engineer

3 weeks ago

, , Canada DuckDuckGo Full time

6 days ago Be among the first 25 applicants Get AI-powered advice on this job and more exclusive features. Who We AreHi, we're DuckDuckGo, the online protection company and remote-first team of 300+ on a mission to raise the standard of trust online. Founded in 2008 and profitable since 2014, our annual revenue now exceeds $100 million USD. Millions use our...
Senior Site Reliability Engineer

3 weeks ago

, , Canada TextNow Full time

This range is provided by TextNow. Your actual pay will be based on your skills and experience — talk with your recruiter to learn more. Base pay range CA$113,400.00/yr - CA$162,000.00/yr We believe communication belongs to everyone. We exist to democratize phone service. TextNow is evolving the way the world connects and that\'s because we\'re made up of...
Senior Site Reliability Engineer

5 days ago

, , Canada TekRek Full time

This range is provided by TekRek. Your actual pay will be based on your skills and experience — talk with your recruiter to learn more. Base pay range CA$90.00/hr - CA$120.00/hr Senior Site Reliability Engineer – Distributed Systems, Kubernetes, AWS/GCP The Company TekRek has partnered with a fast‑scaling AI infrastructure company building one of the...
Senior Site Reliability Engineer

3 weeks ago

, BC, Canada Orion Innovation Full time

Overview Senior Site Reliability Engineer (SRE) with Kubernetes and Rancher. Full-time role focused on building and maintaining highly resilient, secure systems, including in air-gapped environments. Responsibilities System Architecture & Management: Design, architect, and maintain highly reliable, multi-tenant systems using Kubernetes and related tools...
Senior Site Reliability Engineer

2 weeks ago

, , Canada Wealthsimple Full time

Join to apply for the Senior Site Reliability Engineer role at Wealthsimple Get AI-powered advice on this job and more exclusive features. Your career is an investment that grows over time! Wealthsimple is on a mission to help everyone achieve financial freedom by reimagining what it means to manage your money. Using smart technology, we take financial...
Senior Site Reliability Engineer

5 days ago

, , Canada D-Wave Full time

Join to apply for the Senior Site Reliability Engineer role at D‑Wave . D‑Wave (NYSE: QBTS) is a leader in the development and delivery of quantum computing systems, software, and services. We are the world’s first commercial supplier of quantum computers, and the only company building both annealing and gate‑model quantum computers. Our mission is...
Site Reliability Engineer

3 weeks ago

, , Canada Bitcomplete Full time

Join us as a Senior Site Reliability Engineer to help us run an industry-scale GPU cluster via Kubernetes. Together with senior members of our team, you will combine your strong understanding of system scaling and security practices with your cloud-native expertise to stand up and maintain Kubernetes clusters from scratch. Your role will also be pivotal in...
Site Reliability Engineer

2 weeks ago

, , Canada SPECTRAFORCE Full time

Job Title: DevOps/Site Reliability Engineer Duration: 12+ months Core hours of the position: somewhat flexible, but able to attend meetings and collaborate with team members between 8 am Pacific and 3 pm Pacific. Team members are located in Pacific, Mountain, Central, and East time zones Top 3 items to see on resumes 5+ years of experience in DevOps, Site...

Americas

Europe

Asia / Oceania

Africa

Senior Site Reliability Engineer