See more Collapse

Site Reliability Engineer III

2 months ago


Toronto, Canada Rakuten Kobo Full time

The Role

At Rakuten Kobo, we develop software that covers a rich set of domains, including hardware devices, eCommerce, content rendering, and an expanding data ecosystem. Our SRE team provides the safety net that empowers our 50+ product developers to move fast. We are seeking an experienced Site Reliability Engineer III to help ensure the reliability and performance of our applications. 

 
Responsibilities: 

Primarily responsible for application observability and incident response.  Work with the Site Operations and Platform Engineering teams in automating our cloud development environment.  Automate operations functions, such as infrastructure provisioning, system configurations, auto-scaling, and code deployment.  Identify and implement best practices and standards for application observability and reliability.  Monitor, troubleshoot, and rectify application and infrastructure issues.  Collaborate with platform engineers to integrate developer productivity considerations into the overall system reliability and operability.  Help development teams to identify and address observability and resilience pain points in their systems.  Participate in on-call rotations, providing off-hours support as needed. 

The Skillset:

Deep understanding of systems architecture and infrastructure, including networking, storage, databases, and cloud technologies.  Ability to diagnose complex technical issues, identify root causes, and propose effective solutions.  Competence in at least one object oriented programming language (e.g. C#, Java)  Expertise in automation tools, configuration management, and scripting languages.  Experience in setting up effective monitoring tools, configuring thresholds, and defining actionable alerts. Knowledge of document databases, relational databases, message queuing, and caching systems.  Proficiency in IaC, CaC, GitOps, Terraform, Ansible, Docker, Kubernetes, Load Balancing, IIS, Tomcat, and CDNs.  Ability to analyze system performance, identify bottlenecks, and optimize system capacity. Conduct load testing and performance tuning to ensure optimal system performance.  Strong sense of ownership and high standards for quality. Excellent communication skills and the ability to collaborate effectively. 

The Perks :

Flexible hours and remote working environment  4 extended summer long weekends Full benefits starting from your first day  Paid Volunteer days, unlimited sick days, and 3% RRSP matching  Monthly commuting allowance for hybrid employees Flexible health spending account  Training budget + Udemy account  Free Kobo device + free weekly e-book or audiobook  Weekly Kobo Tech University sessions  Maternity/paternity leave top up  90 Day Work from Anywhere program Daily lunch credit when in-office and in-office snacks Dog friendly office

We have other current jobs related to this field that you can find below


  • Toronto, Ontario, Canada Rakuten Kobo Full time

    The Role At Rakuten Kobo, we develop software that covers a rich set of domains, including hardware devices, eCommerce, content rendering, and an expanding data ecosystem. Our SRE team provides the safety net that empowers our 50+ product developers to move fast. We are seeking an experienced Site Reliability Engineer III to help ensure the reliability and...


  • Old Toronto, Canada Rakuten Kobo Full time

    The Role At Rakuten Kobo, we develop software that covers a rich set of domains, including hardware devices, eCommerce, content rendering, and an expanding data ecosystem. Our SRE team provides the safety net that empowers our 50+ product developers to move fast. We are seeking an experienced Site Reliability Engineer III to help ensure the reliability and...


  • Old Toronto, Canada Rakuten Kobo Full time

    The Role At Rakuten Kobo, we develop software that covers a rich set of domains, including hardware devices, eCommerce, content rendering, and an expanding data ecosystem. Our SRE team provides the safety net that empowers our 50+ product developers to move fast. We are seeking an experienced Site Reliability Engineer III to help ensure the reliability and...


  • Toronto, Canada CB Canada Full time

    Site Reliability Engineer On behalf of our client in the Banking Sector, PROCOM is looking for a Site Reliability Engineer. Site Reliability Engineer – Job Description Azure cloud Jira and confluence CICD Experience with automating (provisioning, configuration management, deployment) and integrating Azure PaaS solutions (Azure App services, Azure...


  • Toronto, Canada CB Canada Full time

    Site Reliability Engineer On behalf of our client in the Banking Sector, PROCOM is looking for a Site Reliability Engineer. Site Reliability Engineer – Job Description Azure cloud Jira and confluence CICD Experience with automating (provisioning, configuration management, deployment) and integrating Azure PaaS solutions (Azure App services, Azure...


  • Old Toronto, Canada Guidewire Full time

    ESSENTIAL DUTIES AND RESPONSIBILITIES Take a purist SRE approach to shared multi-tenant infrastructure for a resilient SaaS microservice-based containerized systems in addition to customer-centric application environments Oversee and automate the team’s growing presence in AWS Contribute to core infrastructure systems development with features, bug fixes,...


  • Old Toronto, Canada Guidewire Full time

    ESSENTIAL DUTIES AND RESPONSIBILITIES Take a purist SRE approach to shared multi-tenant infrastructure for a resilient SaaS microservice-based containerized systems in addition to customer-centric application environments Oversee and automate the team’s growing presence in AWS Contribute to core infrastructure systems development with features, bug fixes,...


  • Old Toronto, Canada CB Canada Full time

    Site Reliability Engineer On behalf of our client in the Banking Sector, PROCOM is looking for a Site Reliability Engineer. Site Reliability Engineer – Job Description Azure cloud Jira and confluence CICD Experience with automating (provisioning, configuration management, deployment) and integrating Azure PaaS solutions (Azure App services, Azure...


  • Old Toronto, Ontario, Canada CB Canada Full time

    Site Reliability EngineerOn behalf of our client in the Banking Sector, PROCOM is looking for a Site Reliability Engineer.Site Reliability Engineer – Job DescriptionAzure cloudJira and confluenceCICD Experience with automating (provisioning, configuration management, deployment) and integrating Azure PaaS solutions (Azure App services, Azure Kubernetes...


  • Old Toronto, Canada CB Canada Full time

    Site Reliability Engineer On behalf of our client in the Banking Sector, PROCOM is looking for a Site Reliability Engineer. Site Reliability Engineer – Job Description Azure cloud Jira and confluence CICD Experience with automating (provisioning, configuration management, deployment) and integrating Azure PaaS solutions (Azure App services, Azure...


  • Old Toronto, Canada Reperio Human Capital Full time

    Site Reliability Engineer 100421 Desired skills: Site Reliability Engineer, SRE, Cloud, Permanent, Remote Site Reliability Engineer Location: Ireland/UK Salary: €70K+ Type: Permanent, Full-time We're seeking experienced Site Reliability Engineers who excel at ensuring the reliability and scalability of production systems, and possess extensive experience...


  • Old Toronto, Canada Reperio Human Capital Full time

    Site Reliability Engineer 100421 Desired skills: Site Reliability Engineer, SRE, Cloud, Permanent, Remote Site Reliability Engineer Location: Ireland/UK Salary: €70K+ Type: Permanent, Full-time We're seeking experienced Site Reliability Engineers who excel at ensuring the reliability and scalability of production systems, and possess extensive experience...


  • Old Toronto, Canada Reperio Human Capital Full time

    Site Reliability Engineer 100421 Desired skills: Site Reliability Engineer, SRE, Cloud, Permanent, Remote Location: Ireland/UK Salary: €70K+ Type: Permanent, Full-time We're seeking experienced Site Reliability Engineers who excel at ensuring the reliability and scalability of production systems, and possess extensive experience with monitoring and...


  • Toronto, Canada Capgemini Full time

    Role: Site Reliability Engineer - Production Support Location: Toronto, ON FTE Job Description: 7+ Years of Experience Excellent Communication. Engineering: Develop SRE solutions (monitoring and alerting, machine learning anomaly detection, self-healing and reliability testing) Simplifies develo


  • Toronto, Canada Sigmaways Inc Full time

    We're seeking a Site Reliability Engineer to join our team with expertise in Kubernetes and troubleshooting.Responsibilities:Monitor, measure, and report alerts, overall health, performance, and capacity of one or more services.Gain deep knowledge and learn the application stack.Ability to debug and optimize code and automate routine tasks.Function well in a...


  • Toronto, Ontario, Canada Zortech Solutions Full time

    Hi,Hope you are doing GreatThis side Priya Rajput from Zortech Solutions trying to reach you for an exciting job opening, kindly have a look to job description and revert me with your positive feedback. My mail ID is or call me on .Role: Site Reliability EngineerLocation: Toronto, ON-OnsiteDuration: Fulltime PermanentSkills and Responsibilities:...


  • Toronto, Ontario, Canada Zortech Solutions Full time

    Hi,Hope you are doing GreatThis side Priya Rajput from Zortech Solutions trying to reach you for an exciting job opening, kindly have a look to job description and revert me with your positive feedback. My mail ID is or call me on .Role: Site Reliability EngineerLocation: Toronto, ON-OnsiteDuration: Fulltime PermanentSkills and Responsibilities:...


  • Old Toronto, Canada E-Solutions Full time

    Job Title: Site Reliability Engineer Location: Toronto, ON Skills and Responsibilities: Collaborate with teams to enhance application and transaction scalability using Azure Kubernetes Service (AKS) and Azure scalability features. Develop application monitoring strategies using New Relic, Devo, and Azure Monitor, including creating monitors and...


  • Old Toronto, Canada E-Solutions Full time

    Job Title: Site Reliability Engineer Location: Toronto, ON Skills and Responsibilities: Collaborate with teams to enhance application and transaction scalability using Azure Kubernetes Service (AKS) and Azure scalability features. Develop application monitoring strategies using New Relic, Devo, and Azure Monitor, including creating monitors and...


  • Old Toronto, Canada Epsilon Solutions Ltd. Full time

    Job Title: Site Reliability EngineerLocation: Toronto, ONSkills And Responsibilities Collaborate with teams to enhance application and transaction scalability using Azure Kubernetes Service (AKS) and Azure scalability features. Develop application monitoring strategies using New Relic, Devo, and Azure Monitor, including creating monitors and dashboards....