Site Reliability Engineer III

3 days ago


Old Toronto, Canada Rakuten Kobo Full time

The Role

At Rakuten Kobo, we develop software that covers a rich set of domains, including hardware devices, eCommerce, content rendering, and an expanding data ecosystem. Our SRE team provides the safety net that empowers our 50+ product developers to move fast. We are seeking an experienced Site Reliability Engineer III to help ensure the reliability and performance of our applications.

Responsibilities:

  1. Primarily responsible for application observability and incident response.
  2. Work with the Site Operations and Platform Engineering teams in automating our cloud development environment.
  3. Automate operations functions, such as infrastructure provisioning, system configurations, auto-scaling, and code deployment.
  4. Identify and implement best practices and standards for application observability and reliability.
  5. Monitor, troubleshoot, and rectify application and infrastructure issues.
  6. Collaborate with platform engineers to integrate developer productivity considerations into the overall system reliability and operability.
  7. Help development teams to identify and address observability and resilience pain points in their systems.
  8. Participate in on-call rotations, providing off-hours support as needed.

The Skillset:

  • Deep understanding of systems architecture and infrastructure, including networking, storage, databases, and cloud technologies.
  • Ability to diagnose complex technical issues, identify root causes, and propose effective solutions.
  • Competence in at least one object-oriented programming language (e.g. C#, Java).
  • Expertise in automation tools, configuration management, and scripting languages.
  • Experience in setting up effective monitoring tools, configuring thresholds, and defining actionable alerts.
  • Knowledge of document databases, relational databases, message queuing, and caching systems.
  • Proficiency in IaC, CaC, GitOps, Terraform, Ansible, Docker, Kubernetes, Load Balancing, IIS, Tomcat, and CDNs.
  • Ability to analyze system performance, identify bottlenecks, and optimize system capacity. Conduct load testing and performance tuning to ensure optimal system performance.
  • Strong sense of ownership and high standards for quality. Excellent communication skills and the ability to collaborate effectively.

The Perks:

  • Flexible hours and remote working environment
  • 4 extended summer long weekends
  • Full benefits starting from your first day
  • Paid Volunteer days, unlimited sick days, and 3% RRSP matching
  • Monthly commuting allowance for hybrid employees
  • Flexible health spending account
  • Training budget + Udemy account
  • Free Kobo device + free weekly e-book or audiobook
  • Weekly Kobo Tech University sessions
  • Maternity/paternity leave top-up
  • 90 Day Work from Anywhere program
  • Daily lunch credit when in-office and in-office snacks
  • Dog-friendly office
#J-18808-Ljbffr

  • Toronto, Ontario, Canada Rakuten Kobo Full time

    The Role At Rakuten Kobo, we develop software that covers a rich set of domains, including hardware devices, eCommerce, content rendering, and an expanding data ecosystem. Our SRE team provides the safety net that empowers our 50+ product developers to move fast. We are seeking an experienced Site Reliability Engineer III to help ensure the reliability and...


  • Toronto, Canada Rakuten Kobo Full time

    The Role At Rakuten Kobo, we develop software that covers a rich set of domains, including hardware devices, eCommerce, content rendering, and an expanding data ecosystem. Our SRE team provides the safety net that empowers our 50+ product developers to move fast. We are seeking an experienced Site Reliability Engineer III to help ensure the reliability...


  • Old Toronto, Canada CB Canada Full time

    Site Reliability Engineer On behalf of our client in the Banking Sector, PROCOM is looking for a Site Reliability Engineer. Site Reliability Engineer – Job Description Azure cloud Jira and confluence CICD Experience with automating (provisioning, configuration management, deployment) and integrating Azure PaaS solutions (Azure App services, Azure...


  • Old Toronto, Ontario, Canada CB Canada Full time

    Site Reliability EngineerOn behalf of our client in the Banking Sector, PROCOM is looking for a Site Reliability Engineer.Site Reliability Engineer – Job DescriptionAzure cloudJira and confluenceCICD Experience with automating (provisioning, configuration management, deployment) and integrating Azure PaaS solutions (Azure App services, Azure Kubernetes...


  • Old Toronto, Canada CB Canada Full time

    Site Reliability Engineer On behalf of our client in the Banking Sector, PROCOM is looking for a Site Reliability Engineer. Site Reliability Engineer – Job Description Azure cloud Jira and confluence CICD Experience with automating (provisioning, configuration management, deployment) and integrating Azure PaaS solutions (Azure App services, Azure...


  • Old Toronto, Canada Guidewire Full time

    ESSENTIAL DUTIES AND RESPONSIBILITIES Take a purist SRE approach to shared multi-tenant infrastructure for a resilient SaaS microservice-based containerized systems in addition to customer-centric application environments Oversee and automate the team’s growing presence in AWS Contribute to core infrastructure systems development with features, bug fixes,...


  • Old Toronto, Canada Guidewire Full time

    ESSENTIAL DUTIES AND RESPONSIBILITIES Take a purist SRE approach to shared multi-tenant infrastructure for a resilient SaaS microservice-based containerized systems in addition to customer-centric application environments Oversee and automate the team’s growing presence in AWS Contribute to core infrastructure systems development with features, bug fixes,...


  • Old Toronto, Canada Reperio Human Capital Full time

    Site Reliability Engineer 100421 Desired skills: Site Reliability Engineer, SRE, Cloud, Permanent, Remote Site Reliability Engineer Location: Ireland/UK Salary: €70K+ Type: Permanent, Full-time We're seeking experienced Site Reliability Engineers who excel at ensuring the reliability and scalability of production systems, and possess extensive experience...


  • Old Toronto, Canada Reperio Human Capital Full time

    Site Reliability Engineer 100421 Desired skills: Site Reliability Engineer, SRE, Cloud, Permanent, Remote Site Reliability Engineer Location: Ireland/UK Salary: €70K+ Type: Permanent, Full-time We're seeking experienced Site Reliability Engineers who excel at ensuring the reliability and scalability of production systems, and possess extensive experience...


  • Old Toronto, Canada Reperio Human Capital Full time

    Site Reliability Engineer 100421 Desired skills: Site Reliability Engineer, SRE, Cloud, Permanent, Remote Location: Ireland/UK Salary: €70K+ Type: Permanent, Full-time We're seeking experienced Site Reliability Engineers who excel at ensuring the reliability and scalability of production systems, and possess extensive experience with monitoring and...


  • Old Toronto, Canada E-Solutions Full time

    Job Title: Site Reliability Engineer Location: Toronto, ON Skills and Responsibilities: Collaborate with teams to enhance application and transaction scalability using Azure Kubernetes Service (AKS) and Azure scalability features. Develop application monitoring strategies using New Relic, Devo, and Azure Monitor, including creating monitors and...


  • Old Toronto, Canada E-Solutions Full time

    Job Title: Site Reliability Engineer Location: Toronto, ON Skills and Responsibilities: Collaborate with teams to enhance application and transaction scalability using Azure Kubernetes Service (AKS) and Azure scalability features. Develop application monitoring strategies using New Relic, Devo, and Azure Monitor, including creating monitors and...


  • Old Toronto, Canada Epsilon Solutions Ltd. Full time

    Job Title: Site Reliability EngineerLocation: Toronto, ONSkills And Responsibilities Collaborate with teams to enhance application and transaction scalability using Azure Kubernetes Service (AKS) and Azure scalability features. Develop application monitoring strategies using New Relic, Devo, and Azure Monitor, including creating monitors and dashboards....


  • Old Toronto, Canada Thomson Reuters Full time

    (Canada) Site Reliability Engineer (Contract) Contract (9 months 4 days) Published 3 days ago New Relic Data Dog Site Reliability Engineer - in the Service Management OrganizationDo you have experience in IT Service Management, working with cloud providers, software development, and technology infrastructure?The Site Reliability Engineer will...


  • Old Toronto, Canada Thomson Reuters Full time

    (Canada) Site Reliability Engineer (Contract) Contract (9 months 4 days) Published 3 days ago New Relic Data Dog Site Reliability Engineer - in the Service Management OrganizationDo you have experience in IT Service Management, working with cloud providers, software development, and technology infrastructure?The Site Reliability Engineer will...


  • Old Toronto, Canada Thomson Reuters Full time

    (Canada) Site Reliability Engineer (Contract) Contract (9 months 4 days) Published 3 days ago New Relic Data Dog Site Reliability Engineer - in the Service Management OrganizationDo you have experience in IT Service Management, working with cloud providers, software development, and technology infrastructure?The Site Reliability Engineer will...


  • Old Toronto, Canada Autodesk Full time

    Position Overview Autodesk, the leading Design and Make Software Company, is looking for a Principal Site Reliability Engineer to join the Autodesk Platform Services Engineering team in Toronto, Canada. On this position, you will help build trusted services of APS (Autodesk Platform Services) as measured by Service Level Objectives (SLOs) and Mean Time to...


  • Old Toronto, Canada Autodesk Full time

    Position Overview Autodesk, the leading Design and Make Software Company, is looking for a Principal Site Reliability Engineer to join the Autodesk Platform Services Engineering team in Toronto, Canada. On this position, you will help build trusted services of APS (Autodesk Platform Services) as measured by Service Level Objectives (SLOs) and Mean Time to...


  • Old Toronto, Canada eTeam Full time

    Remote Work Duration 4 months - Preference is to find candidates who are willing to be converted to full-time employees. The conversion decision will be made based on performance. Job Description Role Description: Defining and measuring reliability goals—SLIs, SLOs, and error budgets for user journey. Designing for and implementing observability (ELK,...


  • Old Toronto, Canada eTeam Full time

    Remote Work Duration 4 months - Preference is to find candidates who are willing to be converted to full-time employees. The conversion decision will be made based on performance. Job Description Role Description: Defining and measuring reliability goals—SLIs, SLOs, and error budgets for user journey. Designing for and implementing observability (ELK,...