Senior Site Reliability Engineer

1 week ago


Montreal, Quebec, Canada Botpress Technologies Inc. Full time

Help bring AI agents to companies worldwide.

Over the next decade, autonomous agents will redefine how we work.

Botpress allows companies to build and deploy advanced AI agents that move beyond conversation into real business logic.

Our product works today and at scale, across industries, regions, and limitless use cases.

As the 3rd fastest-growing B2B AI start-up worldwide, we're at the forefront of the AI revolution, providing the most widely-used platform for sophisticated AI agents.

The work ahead is ambitious. The opportunity is rare. We take a deliberate approach to growth: product-led, capital-efficient, and highly focused.

If you want to build foundational technology for one of the most meaningful platform shifts in software, we're looking for top talent to join us.

Key Highlights:

  • Over 1 million AI agents and chatbots deployed
  • 700,000+ platform users
  • Trusted by 35% of Fortune 500 companies
  • 7 years of expertise in AI solutions
About the Role

We're hiring a Site Reliability Engineer to help ensure the stability, scalability, and security of our platform. You'll be part of the product team, owning the systems that keep our services resilient and performant under real-world loads.

This is a hands-on engineering role focused on infrastructure reliability and operational excellence. You'll architect and maintain the cloud systems (e.g. AWS) that power Botpress, with a strong focus on observability, uptime, and automation.

You'll collaborate closely with engineers to refine how we ship, monitor, and operate software — always with an eye toward reducing risk and improving speed. Part of this role will include opening up the site to different regions of users.

Responsibilities
  • Architect and maintain scalable infrastructure
  • Design and optimize CI/CD pipelines to ensure smooth delivery of changes
  • Improve observability through advanced monitoring, logging, and alerting
  • Own incident response and support the engineering team in diagnosing and resolving issues
  • Build systems that increase platform reliability, resiliency, and uptime
  • Enforce security best practices across environments and workflows
  • Manage infrastructure as code using tools like Terraform or Pulumi
  • Document operational procedures, disaster recovery plans, and system runbooks
Requirements
  • 3+ years working with Typescript (Pulumi, React for Backstage, Cli tools)
  • 5+ years in SRE, DevOps, or infrastructure engineering roles
  • Deep experience with AWS cloud infrastructure and services (ECS, S3, Lambda, RDS)
  • Comfortable with Linux systems, containerization, and orchestration (e.g. Docker, Kubernetes)
  • Proficient in CI/CD tools, infrastructure-as-code, and automation scripting
  • Familiar with incident management and site reliability principles
  • Experience with observability stacks like Datadog, Grafana, Prometheus, etc.
  • Strong communicator and collaborator across technical teams
  • Calm and systematic under pressure when production issues arise
  • Bonus: Previous experience in a fast-paced startup or SaaS environment
About Botpress

Botpress recently raised its $25 million Series B funding. As a fast-growing start-up, we run a lean and innovative ship that leans on AI for maximum business impact. At Botpress, everyone is an owner, bringing their unique perspective and talents.

Our teams are talented and passionate. We intentionally hire individuals who are eager, passionate, talented, and hungry to learn and grow throughout their career.

You'll be on a team that's not just adapting to the AI revolution, but leading it. Joining our team means changing the future of enterprise AI and building technology that will define the next era of business automation.

Benefits
  • Work at one of Canada's fastest-growing AI start-ups
  • Work with a talented and passionate team
  • 4 weeks of vacation
  • Paid sick and parental leave
  • Comprehensive health, dental, vision, travel, and life insurance
  • Funding for education and skills improvement
  • Fully-stocked fridge and cupboard – we take snacks seriously
  • Your own desk – no 'hot-desk'-style sign-up systems
  • A vibrant office community, including weekly socials


  • Montreal, Quebec, Canada Orion Innovation Full time

    Orion Innovation is a premier, award-winning, global business and technology services firm.  Orion delivers game-changing business transformation and product development rooted in digital strategy, experience design, and engineering, with a unique combination of agility, scale, and maturity.  We work with a wide range of clients across many industries...


  • Montreal, Quebec, Canada Botpress Full time

    Help bring AI agents to companies worldwide.Over the next decade, autonomous agents will redefine how we work.Botpress allows companies to build and deploy advanced AI agents that move beyond conversation into real business logic.Our product works today and at scale, across industries, regions, and limitless use cases.As the 3rd fastest-growing B2B AI...


  • Montreal, Quebec, Canada Botpress Technologies Inc. Full time

    Description Help bring AI agents to companies worldwide. Over the next decade, autonomous agents will redefine how we work. Botpress allows companies to build and deploy advanced AI agents that move beyond conversation into real business logic. Our product works today and at scale, across industries, regions, and limitless use cases. As the 3rd...


  • Montreal, Quebec, Canada Open Systems Technologies Full time

    Job Title: Site Reliability EngineerLocation: Montreal – Hybrid – 3 days/weekTerm: 12 months contract plus extensionThe Application Infrastructure (AI) department is seeking a Site Reliability Engineer (SRE) to help drive reliability engineering, operations and customer support services for client's ServiceNow SaaS implementation. Reporting to a Site...


  • Montreal, Quebec, Canada Roshan Consulting Services Full time

    Company DescriptionRoshan Consulting empowers businesses to optimize operations and enhance efficiency through innovative strategies and technologies tailored to their unique needs. Our mission is to drive digital transformation and deliver sustainable growth by offering services such as Robotic Process Automation (RPA), business process optimization, and...


  • Montreal, Quebec, Canada Intelcom Full time

    Make your internship countAt Intelcom, interns don't just observe, they contribute meaningfully to real projects that shape how we operate. You'll gain hands-on experience, grow your skills, and explore long-term career opportunities in a fast-moving, innovation-driven environment. Ride the next mile with usWe are seeking a Site Reliability Engineering (SRE)...


  • Montreal, Quebec, Canada Compunnel Inc. Full time

    Job Title: Site Reliability Engineer (SRE), ServiceNow, Application InfrastructureExperience Level: Level 4 (advanced): 7-15 yearsLocation: Montreal (Day 1 onboarding onsite / in office presence 3x weekSkills required:• The ideal candidate would have at least one of: Software development skills in one or more programming language, e.g. Python, ServiceNow...


  • Montreal, Quebec, Canada Omiz Staffing Solutions (OSS) Full time

    Position: Site Reliability EngineerLocation: Montreal, QC Canada (Hybrid – 3-4 days onsite in a week)Duration: Long-Term ContractJob DescriptionDelivery of improvements that will maximize the availability and performance of supported systems through optimized and automated operational tasks, collaborating on the development of operational tools, ongoing...


  • Montreal, Quebec, Canada Axelon Services Corporation Full time

    Job Title:Site Reliability Engineer (SRE) - ServiceNow / Application InfrastructureExperience Level:Level 4 (advanced): 7-15 yearsLocation: Montreal (Day 1 onboarding onsite / in office presence 3x week)Contract Duration:12 Months ContractSkills Required:At least one of: Software development skills in one or more programming languages, e.g. Python,...


  • Montreal, Quebec, Canada Global Talent Alliance, Canada Full time

    (#11072)The role of the Specialist Site Reliability Engineer (SRE) is to execute RAM analysis and engineering in support of the I&T solutions. The overall mandate is to ensure that these solutions have attributes of high robustness, reliability, and availability. This involves system and product analysis, modeling and requirements assessment during the...