Site Reliability Engineer

5 days ago


Toronto, Ontario, Canada Kablamo Full time $90,000 - $120,000 per year

Reports to: Technical Support Manager

Location: Toronto (Hybrid)

Role Type: Full time

Level: Intermediate/Mid

Introduction
Kablamo is a fast-growing cloud digital product development company. Founded in 2017 in Australia, the business has grown quickly over the last several years, including the expansion of the team to Canada in 2021. We are proud to have assembled a wonderful team, and a list of customers that includes some of the best known enterprises and government organisations in Australia and Canada. We're looking to further accelerate our growth in both markets, and we're seeking a Mid Level Site Reliability Engineer to bolster Kablamo Canada's support team, delivering exceptional customer experience for the ongoing support of customer applications.

Over the last three years, we have laid a strong foundation for Kablamo Canada, including local customers and a Canada-based delivery and leadership team of around 30 people. We aspire to become a market leader in designing and building bespoke software in Canada, and to be a destination of choice for the world's smartest people and world's best customers

Kablamo is an Advanced Tier AWS Consulting partner, and we've been recognised as a global leader in designing and building cloud-based data and AI/ML solutions. We've also been recognised with multiple industry and design awards for our work in developing one of the world's leading wildfire intelligence platforms, Firestory. We aim to continue our track record of success by expanding our customer-base, local team, brand, social impact and community involvement in Canada over the coming years.

The Role
Kablamo's Product Care service, provides end-to-end support for the bespoke software that our company designs and builds for our customers.

Our Product Care teams provide "Level 3" application support for customers, providing an ongoing managed service that delivers expertise across all layers of the applications. Key roles within Kablamo's Product Care team include Technical Account Managers, Site Reliability Engineers and Technical Support Managers, and Full Stack Developers who provide support in a professional and responsive way for our customers. The Product Care team addresses inquiries, resolution of issues, gathering feedback and driving product enhancement.

As we expand the capability across our Product Care offering, we are looking for a Mid-Level Site Reliability Engineer (SRE) to join our team and contribute to delivering reliable, scalable, and efficient systems. You will help improve the reliability and performance of our platforms by contributing to the automation of operational tasks such as monitoring, alerting, and incident response.

This role works closely with our development and support teams to support infrastructure automation and performance tuning. You will also participate in incident response processes and help identify opportunities for improving system resilience and reducing manual operational tasks. You'll gain exposure to large-scale cloud infrastructure, modern DevOps practices, and the tools that power high-availability systems, while developing your skills and learning from experienced engineers.

Key Responsibilities

  • Operate as a core member of our support team by providing technical support to customers, through ticketing and communication channels, ensuring all SLA and SLO objectives are met
  • Collaborate with the Technical Account Manager to identify recurring issues and reduce alarm fatigue through root cause analysis and long-term solutions.
  • Contribute to the design, implementation, and maintenance of our AWS infrastructure
  • Support the operation of underlying infrastructure and ensure systems and tools are functioning as expected by identifying and addressing potential risks.
  • Assist in maintaining system reliability by investigating and resolving performance, stability, and scalability issues in line with SLA and SLO requirements.
  • Support efforts to anticipate production issues by identifying risks and contributing to mitigation and contingency planning.
  • Contribute to building and enhancing observability across systems, including metrics, logging, tracing, and alerting to improve system transparency and incident response
  • Contribute to the development or implementation of visual tools for monitoring system health and reliability reporting.
  • Use automation tools and write code to automate operational processes such as log analysis, testing and monitoring.
  • Working with the engineering and/or development team to identify and resolve recurring problems through automation and process improvement
  • Act on system incidents as a key contact, collaborating in incident response and participating in post-incident reviews (PIRs).
  • Collaborate with developers to help ensure solutions meet non-functional requirements such as availability, performance, and maintainability.
  • Support production release activities and participate in internal readiness discussions, occasionally attending technical meetings with guidance from senior team members.
  • Work cross-functionally to help deliver solutions and contribute ideas for system improvements, under guidance where needed.
  • Responsible for identifying and managing security risks within the scope of the role, ensuring compliance with defined security controls, escalating emerging threats to the Risk Manager and Senior Leadership Team, and supporting the continuous improvement of security practices.

Required Skills And Experience

  • 3+ years of experience in a SRE, DevOps, or Cloud Engineering role
  • Experience working with AWS services, such as EC2, Lambda, CloudFormation/Terraform, VPC, IAM, or ECS
  • Familiarity with monitoring and observability tools such as CloudWatch, Datadog, Grafana, Prometheus, or similar
  • Proficiency in at least one programming or scripting language (e.g., Python, Go, Bash, JavaScript, or Java)
  • A solid grasp of DevOps principles and experience with Infrastructure-as-Code tools (e.g., Terraform, CloudFormation)
  • Solid understanding of system architecture principles, with the ability to contribute to design discussions and reviews
  • Strong problem-solving skills with the ability to troubleshoot effectively and communicate clearly with engineering teams and/or customers
  • A proactive approach to identifying issues, performance bottlenecks, and opportunities for improvement
  • Experience working with incident management tools like PagerDuty, OpsGenie, or Jira Service Management
  • Ability to collaborate across engineering, support, or development teams to deliver solutions
  • Understanding of basic security best practices in cloud and infrastructure environments
  • Working knowledge of version control systems, especially Git (e.g., branching, merging, pull requests)
  • Experience with CI/CD pipelines, including deployment automation and testing integration
  • Familiarity with rollback strategies and safe deployment practices
  • Understanding of fundamental networking concepts and protocols (TCP/IP, DNS, HTTP, etc.)

Bonus Points For

  • Bachelor's degree in Computer Science, Engineering, or a related technical field — or equivalent practical experience
  • AWS Associate-level certification (or progress toward one)
  • Exposure to networking, security, and reliability fundamentals, with an interest in deepening this knowledge
  • Experience working in a consultancy or Professional Services environment

Future Roles / Career Progression

  • Senior SRE
  • Lead SRE

Hiring Process

  • 30-min intro chat with our TA team
  • 45-min Interview with Hiring Manager
  • 1-hr Technical Interview
  • 30-min Final Interview
  • References
  • Offer

Why Work at Kablamo?
Our Culture
At Kablamo, we strive to "Make with Heart and Mind". We're passionate and brave craftspeople, we love to redefine what's possible, and we seek growth and discovery together. We acknowledge a workplace that is diverse and inclusive, enables for greater innovation and produces benefits including improved performance, improved employee happiness and wellbeing, and superior outcomes for our customers.

The PERKS

  • Flexible work environment with an office located in Toronto
  • Career growth (we really do promote from within)
  • Individual training budget
  • Online rewards platform
  • Regular social events
  • Blogging rewards
  • Paid birthday leave
  • Anniversary bonus
  • Referral bonus
  • Parental Leave top up
  • Employee Assistance Program
  • Swag
  • Work abroad for up to 3 weeks per year (some restrictions apply)

Kablamo is a proud equal opportunity employer. We make our hiring decisions solely based on your skills and experience, as well as the perspectives and value you can bring to our team. Kablamo believes that diversity is vital to provide the best service to our clients and we are committed to fostering a varied and inclusive work environment. Every effort to accommodate candidates for accessibility will be made upon request. Information received related to accommodations will be addressed confidentially.

Kablamo would like to thank all candidates for their interest however only qualified applicants will be shortlisted.



  • Toronto, Ontario, Canada Tekgence Inc Full time $80,000 - $120,000 per year

    Hello,Please find the Job Description belowSite Reliability Engineering (SRE)Toronto ONSkills Required: Digital : Python Digital : Google Cloud Digital : Site Reliability Engineering (SRE)Job Description:Gather and analyze metrics from operating systems as well as applications to assist in performance tuning and fault findingPartner with development teams to...


  • Toronto, Ontario, Canada Tecsys Inc. Full time $85,000 - $130,000 per year

    Having recognized the advantages of remote work, including employee morale, productivity, reduced commuting on employee wellbeing and the environment, we are proud to be a digital-first company. The technologies and programs in which we invested have provided a fantastic foundation to this end. Our digital-first work environment, together with our...


  • Toronto, Ontario, Canada Pixomondo Full time $120,000 - $180,000 per year

    We're seeking an experienced Site Reliability Engineer to join our team and lead infrastructure automation, CI/CD workflows, and deployment operations for a custom web platform. You'll be working with a modern DevOps stack including GitHub Actions, GCP, Kubernetes, Terraform, PostgreSQL, CodeDeploy, and Cloudflare to ensure our platform is robust, scalable,...


  • Toronto, Ontario, Canada AceStack Full time $120,000 - $200,000 per year

    Job Title: Lead Site Reliability Engineer – Banking Domain (Wealth Management Preferred)Location: Toronto Downtown, ON (Onsite – 5 Days/Week)Duration: ContractExperience: 14+ YearsAbout the Role:We are looking for a highly skilled Site Reliability Engineering (SRE) Lead with a strong background in the Banking domain, ideally within Wealth Management. The...


  • Toronto, Ontario, Canada AstraNorth Full time $90,000 - $120,000 per year

    Site Reliability Engineer (SRE) with expertise in Dynatrace monitoring, log investigation, and observability practices. The ideal candidate will have a deep understanding of business processes, upstream-downstream dependencies, and the ability to design and implement dashboards with SLOs and SLAs that align with business objec-tives.Key...


  • Toronto, Ontario, Canada Vitech Systems Group Full time $120,000 - $180,000 per year

    Department:Development Operations (DevOps)Location:CanadaDescriptionAt Vitech, we believe in the power of technology to simplify complex business processes. Our mission is to bring better software solutions to market, addressing the intricacies of the insurance and retirement industries. We combine deep domain expertise with the latest technological...


  • Toronto, Ontario, Canada 3cf5cb8c-b08d-42c2-a6cd-1ee0c7026e02 Full time $120,000 - $180,000 per year

    About Us:Zensurance is redefining commercial insurance for Canadian businesses.As a leading InsurTech, we make getting the right coverage simple, fast, and accessible through a digital-first experience. Our platform combines advanced technology with deep industry expertise to deliver tailored insurance solutions that help businesses thrive.Zensurance has...


  • Toronto, Ontario, Canada Zensurance Full time $120,000 - $180,000 per year

    About Us: Zensurance is redefining commercial insurance for Canadian businesses As a leading InsurTech, we make getting the right coverage simple, fast, and accessible through a digital-first experience. Our platform combines advanced technology with deep industry expertise to deliver tailored insurance solutions that help businesses thrive Zensurance has...


  • Toronto, Ontario, Canada Zensurance Full time $120,000 - $180,000 per year

    About Us: Zensurance is redefining commercial insurance for Canadian businesses.  As a leading InsurTech, we make getting the right coverage simple, fast, and accessible through a digital-first experience. Our platform combines advanced technology with deep industry expertise to deliver tailored insurance solutions that help businesses thrive. Zensurance...


  • Toronto, Ontario, Canada Zensurance Full time $900,000 - $1,200,000 per year

    About Us:Zensurance is redefining commercial insurance for Canadian businesses. As a leading InsurTech, we make getting the right coverage simple, fast, and accessible through a digital-first experience. Our platform combines advanced technology with deep industry expertise to deliver tailored insurance solutions that help businesses thrive.Zensurance has...