Site Reliability Engineer

7 days ago

Toronto, Ontario, Canada Kablamo Full time

Reports to: Technical Support Manager

Location: Toronto (Hybrid)

Role Type: Full time

Level: Intermediate/Mid

Introduction
Kablamo is a fast-growing cloud digital product development company. Founded in 2017 in Australia, the business has grown quickly over the last several years, including the expansion of the team to Canada in 2021. We are proud to have assembled a wonderful team, and a list of customers that includes some of the best known enterprises and government organisations in Australia and Canada. We're looking to further accelerate our growth in both markets, and we're seeking a Mid Level Site Reliability Engineer to bolster Kablamo Canada's support team, delivering exceptional customer experience for the ongoing support of customer applications.

Over the last three years, we have laid a strong foundation for Kablamo Canada, including local customers and a Canada-based delivery and leadership team of around 30 people. We aspire to become a market leader in designing and building bespoke software in Canada, and to be a destination of choice for the world's smartest people and world's best customers

Kablamo is an Advanced Tier AWS Consulting partner, and we've been recognised as a global leader in designing and building cloud-based data and AI/ML solutions. We've also been recognised with multiple industry and design awards for our work in developing one of the world's leading wildfire intelligence platforms, Firestory. We aim to continue our track record of success by expanding our customer-base, local team, brand, social impact and community involvement in Canada over the coming years.

The Role
Kablamo's Product Care service, provides end-to-end support for the bespoke software that our company designs and builds for our customers.

Our Product Care teams provide "Level 3" application support for customers, providing an ongoing managed service that delivers expertise across all layers of the applications. Key roles within Kablamo's Product Care team include Technical Account Managers, Site Reliability Engineers and Technical Support Managers, and Full Stack Developers who provide support in a professional and responsive way for our customers. The Product Care team addresses inquiries, resolution of issues, gathering feedback and driving product enhancement.

As we expand the capability across our Product Care offering, we are looking for a Mid-Level Site Reliability Engineer (SRE) to join our team and contribute to delivering reliable, scalable, and efficient systems. You will help improve the reliability and performance of our platforms by contributing to the automation of operational tasks such as monitoring, alerting, and incident response.

This role works closely with our development and support teams to support infrastructure automation and performance tuning. You will also participate in incident response processes and help identify opportunities for improving system resilience and reducing manual operational tasks. You'll gain exposure to large-scale cloud infrastructure, modern DevOps practices, and the tools that power high-availability systems, while developing your skills and learning from experienced engineers.

Key Responsibilities

Operate as a core member of our support team by providing technical support to customers, through ticketing and communication channels, ensuring all SLA and SLO objectives are met
Collaborate with the Technical Account Manager to identify recurring issues and reduce alarm fatigue through root cause analysis and long-term solutions.
Contribute to the design, implementation, and maintenance of our AWS infrastructure
Support the operation of underlying infrastructure and ensure systems and tools are functioning as expected by identifying and addressing potential risks.
Assist in maintaining system reliability by investigating and resolving performance, stability, and scalability issues in line with SLA and SLO requirements.
Support efforts to anticipate production issues by identifying risks and contributing to mitigation and contingency planning.
Contribute to building and enhancing observability across systems, including metrics, logging, tracing, and alerting to improve system transparency and incident response
Contribute to the development or implementation of visual tools for monitoring system health and reliability reporting.
Use automation tools and write code to automate operational processes such as log analysis, testing and monitoring.
Working with the engineering and/or development team to identify and resolve recurring problems through automation and process improvement
Act on system incidents as a key contact, collaborating in incident response and participating in post-incident reviews (PIRs).
Collaborate with developers to help ensure solutions meet non-functional requirements such as availability, performance, and maintainability.
Support production release activities and participate in internal readiness discussions, occasionally attending technical meetings with guidance from senior team members.
Work cross-functionally to help deliver solutions and contribute ideas for system improvements, under guidance where needed.
Responsible for identifying and managing security risks within the scope of the role, ensuring compliance with defined security controls, escalating emerging threats to the Risk Manager and Senior Leadership Team, and supporting the continuous improvement of security practices.

Required Skills And Experience

3+ years of experience in a SRE, DevOps, or Cloud Engineering role
Experience working with AWS services, such as EC2, Lambda, CloudFormation/Terraform, VPC, IAM, or ECS
Familiarity with monitoring and observability tools such as CloudWatch, Datadog, Grafana, Prometheus, or similar
Proficiency in at least one programming or scripting language (e.g., Python, Go, Bash, JavaScript, or Java)
A solid grasp of DevOps principles and experience with Infrastructure-as-Code tools (e.g., Terraform, CloudFormation)
Solid understanding of system architecture principles, with the ability to contribute to design discussions and reviews
Strong problem-solving skills with the ability to troubleshoot effectively and communicate clearly with engineering teams and/or customers
A proactive approach to identifying issues, performance bottlenecks, and opportunities for improvement
Experience working with incident management tools like PagerDuty, OpsGenie, or Jira Service Management
Ability to collaborate across engineering, support, or development teams to deliver solutions
Understanding of basic security best practices in cloud and infrastructure environments
Working knowledge of version control systems, especially Git (e.g., branching, merging, pull requests)
Experience with CI/CD pipelines, including deployment automation and testing integration
Familiarity with rollback strategies and safe deployment practices
Understanding of fundamental networking concepts and protocols (TCP/IP, DNS, HTTP, etc.)

Bonus Points For

Bachelor's degree in Computer Science, Engineering, or a related technical field — or equivalent practical experience
AWS Associate-level certification (or progress toward one)
Exposure to networking, security, and reliability fundamentals, with an interest in deepening this knowledge
Experience working in a consultancy or Professional Services environment

Future Roles / Career Progression

Senior SRE
Lead SRE

Hiring Process

30-min intro chat with our TA team
45-min Interview with Hiring Manager
1-hr Technical Interview
30-min Final Interview
References
Offer

Why Work at Kablamo?
Our Culture
At Kablamo, we strive to "Make with Heart and Mind". We're passionate and brave craftspeople, we love to redefine what's possible, and we seek growth and discovery together. We acknowledge a workplace that is diverse and inclusive, enables for greater innovation and produces benefits including improved performance, improved employee happiness and wellbeing, and superior outcomes for our customers.

The PERKS

Flexible work environment with an office located in Toronto
Career growth (we really do promote from within)
Individual training budget
Online rewards platform
Regular social events
Blogging rewards
Paid birthday leave
Anniversary bonus
Referral bonus
Parental Leave top up
Employee Assistance Program
Swag
Work abroad for up to 3 weeks per year (some restrictions apply)

Kablamo is a proud equal opportunity employer. We make our hiring decisions solely based on your skills and experience, as well as the perspectives and value you can bring to our team. Kablamo believes that diversity is vital to provide the best service to our clients and we are committed to fostering a varied and inclusive work environment. Every effort to accommodate candidates for accessibility will be made upon request. Information received related to accommodations will be addressed confidentially.

Kablamo would like to thank all candidates for their interest however only qualified applicants will be shortlisted.

Site Reliability Engineer

3 days ago

Toronto, Ontario, Canada Procom Full time

Site Reliability Engineer (SRE)/ Ingénieur Fiabilité des SitesOn behalf of our banking client, Procom is seeking a Site Reliability Engineer (SRE) for a 12-month contract role. This position is a hybrid role, 3 days a week onsite at our client's Montréal, Quebec office.Site Reliability Engineer - Job Description:The Site Reliability Engineer is...
Site-Reliability Engineer

1 week ago

Toronto, Ontario, Canada Aarorn Technologies Inc Full time

Job Title: Site-Reliability Engineer (SRE)Location: Toronto, ON (3x onsite a week)Employment Type: ContractJob DescriptionWe are seeking a highly skilled Site Reliability Engineer (SRE) to enhance the reliability, performance, and efficiency of mission-critical batch workloads within Capital Markets Technology. In this role, you will serve as the technical...
Site Reliability Engineer

7 days ago

Toronto, Ontario, Canada Tecsys Inc. Full time

Having recognized the advantages of remote work, including employee morale, productivity, reduced commuting on employee wellbeing and the environment, we are proud to be a digital-first company. The technologies and programs in which we invested have provided a fantastic foundation to this end. Our digital-first work environment, together with our...
Site Reliability Engineer

2 days ago

Toronto, Ontario, Canada Apptoza Inc. Full time

HI,Hope you are doing Great,If you are fine with below JD please share me your Updated resume ASAP.Site Reliability EngineerLocation: TORONTO (ONSITE)Duration: 6 monthsExp Required: 10 YearsJob Description: Job Title : SRETechnical/Functional Skills• 8+ years of overall IT experience.• Advanced Linux / Unix support experience required.• Strong shell...
Site Reliability Engineer

3 days ago

Toronto, Ontario, Canada Xplor Full time $125,000 - $150,000

Company Description Take a seat on the Xplor rocketship and join us as Site Reliability Engineer to help people succeed across the world.From dropping your kids off at childcare, getting something at home repaired, going to the gym or a fitness studio, to picking up your dry cleaning — our software, payments, and commerce-enabling solutions help everyday...
Site Reliability Engineer

2 weeks ago

Toronto, Ontario, Canada Pixomondo Full time

We're seeking an experienced Site Reliability Engineer to join our team and lead infrastructure automation, CI/CD workflows, and deployment operations for a custom web platform. You'll be working with a modern DevOps stack including GitHub Actions, GCP, Kubernetes, Terraform, PostgreSQL, CodeDeploy, and Cloudflare to ensure our platform is robust, scalable,...
Site Reliability Engineer

1 week ago

Toronto, Ontario, Canada Moneris Full time

Your Moneris Career - The OpportunityWe are looking for a Site Reliability Engineer (SRE) to join our dynamic team. As an SRE, you will help ensure the reliability, performance, and scalability of our systems. You will work with development and operations teams to build and maintain robust infrastructure, automate processes, and improve overall system...
Site Reliability Engineer

7 days ago

Toronto, Ontario, Canada Moneris Full time

Your Moneris Career - The OpportunityAs the Site Reliability Engineer, you will help ensure the reliability, performance, and scalability of our systems. You will work with development and operations teams to build and maintain robust infrastructure, automate processes, and improve overall system health.Location: You will be based in our Toronto office,...
Site Reliability Engineer

3 days ago

Toronto, Ontario, Canada McCain Foods Full time

Position Title:Site Reliability EngineerPosition Type:Regular - Full-TimePosition Location:Toronto HQRequisition ID:36904Our Global Technology team's goal is to leverage technology and data to drive profitable growth, focus on enhancing customer experience and to further our purpose of 'Celebrating real connections through delicious, planet-friendly food'....
Lead Site Reliability Engineer

1 week ago

Toronto, Ontario, Canada AceStack Full time

Job Title: Lead Site Reliability Engineer – Banking Domain (Wealth Management Preferred)Location: Toronto Downtown, ON (Onsite – 5 Days/Week)Duration: ContractExperience: 14+ YearsAbout the Role:We are looking for a highly skilled Site Reliability Engineering (SRE) Lead with a strong background in the Banking domain, ideally within Wealth Management. The...

Americas

Europe

Asia / Oceania

Africa

Site Reliability Engineer