Digital Site Reliability Engineer

4 weeks ago


Old Toronto, Canada Okta, Inc. Full time
We free everyone to safely use any technology—anywhere, on any device or app. Our Workforce and Customer Identity Clouds enable secure yet flexible access, authentication, and automation that transforms how people move through the digital world, putting Identity at the heart of business security and growth.What you’ll be doing
  1. Designing, building, and scaling Okta's production Kubernetes platform
  2. Being an evangelist for security best practices and leading initiatives/projects to strengthen our security posture for critical infrastructure
  3. Responding to production incidents and determining how we can prevent them in the future
  4. Triaging and troubleshooting complex production issues to ensure reliability and performance
  5. Continuously evolving our monitoring tools and platform
  6. Developing and maintaining technical documentation, runbooks, and procedures
  7. Supporting a 24x7 online environment as part of an on-call rotation
What you’ll bring to the role
  1. Are always willing to go the extra mile: see a problem, fix the problem.
  2. Are passionate about encouraging the development of engineering peers and leading by example.
  3. A proven track record of successful SRE engagements and collaborating closely with engineering teams.
  4. Knowledge and experience with deploying microservices and utilizing CI/CD pipelines.
  5. A security mindset that prioritizes protecting assets from risks and vulnerabilities.
Required Skills:
  1. 6+ years of experience with AWS and Terraform
  2. 3+ years of experience provisioning and managing Kubernetes clusters, with solid understanding of containers, Kubernetes infrastructure, and helm charts.
  3. 3+ years of developer experience with Python or Golang
  4. Strong Linux understanding and experience
Preferred Skills:
  1. Experience with Istio service mesh and network policies
  2. Familiarity with Spinnaker
  3. Experience with monitoring and alerting in a Kubernetes ecosystem
  4. Certified Kubernetes Administrator (CKA) or Certified Kubernetes Application Developer (CKAD) certification

#LI-Remote
#LI-MM

Below is the annual salary range for candidates located in Canada. In addition, Okta offers equity (where applicable), bonus, and benefits, including health, dental, and vision insurance, RRSP with a match, healthcare spending, telemedicine, and paid leave (including PTO and parental leave) in accordance with our applicable plans and policies.



  • Old Toronto, Canada Mastech Inc. Full time

    Mastech Digital is an IT Staffing and Digital Transformation Services company.Mastech Digital provides digital and mainstream technology staff as well as Digital Transformation Services for all American Corporations. We are currently seeking a Site Reliability Engineer (GCP) for our client in the Consulting domain. We value our professionals, providing...


  • Old Toronto, Canada Tecsys Inc. Full time

    p>Having recognized the advantages of remote work, including employee morale, productivity, reduced commuting on employee wellbeing and the environment, we are proud to be a digital-first company. The technologies and programs in which we invested have provided a fantastic foundation to this end. Our digital-first work environment, together with our...


  • Old Toronto, Canada Tecsys Full time

    p>Having recognized the advantages of remote work, including employee morale, productivity, reduced commuting on employee wellbeing and the environment, we are proud to be a digital-first company. The technologies and programs in which we invested have provided a fantastic foundation to this end. Our digital-first work environment, together with our...


  • Old Toronto, Canada TD Full time

    Job OverviewWe are seeking a highly skilled Site Reliability Engineering Lead to join our team at TD. As a key member of our technology group, you will be responsible for ensuring the stability, scalability, and reliability of our platforms.About the RoleThe ideal candidate will have a minimum of 8 years of experience in site reliability engineering, with a...


  • Old Toronto, Canada Loblaw Digital Full time

    We're shaping the future of e-commerce at Loblaw Digital, a pioneering team that crafts exceptional online experiences. To achieve our goals, we seek talented and passionate individuals who want to collaborate and solve complex problems, making a lasting impact on Canadians.About the RoleThis position offers an exciting opportunity for you to be part of our...


  • Old Toronto, Canada Street Context Full time

    p>Are you a Site Reliability Engineer that has a passion for building reliable, resilient and performant systems that scale? p>We are on a mission to build and strengthen our engineering teams to match the accelerating success of Street Context. We provide a premium Email, Analytics and Broker Relationship platform, purpose-built for capital markets and...


  • Old Toronto, Canada Soda Full time

    Job Description Job Title: Site Reliability Engineer Location: Poland - Fully Remote Salary: 324K PLN or 27.3K monthly Start: ASAP Stack: AWS, Docker, Kubernetes, Terraform, Jenkins, Ansible, Linux, JavaScript, and Lambda. Are you a seasoned DevOps/SRE professional passionate about building high-performance, scalable systems? I am working with a Media/IT...


  • Old Toronto, Canada Sentry Full time

    Bad software is everywhere, and we’re tired of it. Sentry is on a mission to help developers write better software faster, so we can get back to enjoying technology.With more than $217 million in funding and 100,000+ organizations that believe we’re on to something, we're building performance and error monitoring tools that help companies like Disney,...


  • Old Toronto, Canada Sentry Full time

    p>The Site Reliability Engineering team is responsible for the deployment, configuration, maintenance, and monitoring of Sentry's hosted platform. We do this by leveraging automation tools to automatically spin up and scale services to meet the traffic demands of 1,000,000+ developers. Sentry receives over a billion events a day and processes terabytes of...


  • Old Toronto, Canada Sentry Full time

    Bad software is everywhere, and we’re tired of it. Sentry is on a mission to help developers write better software faster, so we can get back to enjoying technology.With more than $217 million in funding and 100,000+ organizations that believe we’re on to something, we're building performance and error monitoring tools that help companies like Disney,...


  • Old Toronto, Canada Olx Full time

    p>Site Reliability EngineerRemote Poland, PolandOLX – Engineering / Full-time / Remote At OLX, we work together to build a more sustainable world through trade. We make it safe, smart, and convenient to buy and sell cars, find housing, get jobs, buy and sell household goods, and more. Our colleagues around the world help to serve millions of people around...


  • Toronto, Ontario, Canada Royal Bank of Canada Full time

    Job Summary">We are seeking a highly motivated Technical Release Coordinator to join our Digital SRE Environment and Release team. This role offers the unique opportunity to work at the intersection of technology, reliability, and delivery, ensuring the smooth execution of technical projects that directly impact our digital infrastructure and release...


  • Old Toronto, Canada Thomson Reuters Full time

    h3>(Canada) Site Reliability Engineer (Contract)Contract (9 months 4 days)Published 3 days agoNew RelicData DogSite Reliability Engineer - in the Service Management OrganizationDo you have experience in IT Service Management, working with cloud providers, software development, and technology infrastructure?The Site Reliability Engineer will analyze chronic...

  • Data Engineer

    4 months ago


    Old Toronto, Canada Apply Digital Ltd. Full time

    div>Who we are: We’re a global digital transformation partner for change agents. p>What we do: We empower enterprises to shift to evolving business opportunities, gain powerful insights and deliver experiences that drive growth.Who we help: Our 600+ digital specialists have helped global companies like Kraft Heinz, Moderna, Tigo, Atlassian, The Very Group...


  • Old Toronto, Canada Tbwa ChiatDay Inc Full time

    Automate and Optimize Brick and Mortar RetailFocal Systems is the industry leader in retail AI solutions, revolutionizing brick and mortar retail with deep learning computer vision. As a Silicon Valley-based startup, we have more than doubled in size every year since inception.Our MissionWe are looking for smart, creative, and passionate individuals who want...


  • Old Toronto, Canada Digital Associates Full time

    Role SummaryWe are seeking an experienced Digital Solutions Strategist to lead our digital initiatives and oversee the development of cutting-edge software solutions.About the RoleThis is a senior leadership position that requires a strategic leader with a passion for digital innovation and a proven track record in delivering transformative software...


  • Old Toronto, Canada Akamai Full time

    About the RoleAkamai is seeking a highly skilled Digital Reliability Expert to join our team. This role will involve designing, developing, and managing applications and infrastructure that support Akamai's Compute products and services.The successful candidate will collaborate with operations and development teams to create tooling and software that...


  • Old Toronto, Canada Tecsys Full time

    Tecsys is a fast-growing innovator offering supply chain solutions to industry-leading healthcare systems, hospitals, and pharmacy businesses to distributors, retailers, and 3PLs. As a Cloud Infrastructure Specialist, you will be responsible for ensuring the reliability and uptime of our platform and applications in a data-driven way to support internal and...


  • Old Toronto, Canada Ascend Fundraising Solutions Full time

    We are currently seeking a full-time Site Reliability Engineer to join our IT team. In this role, you will collaborate closely with the client services team to diagnose, troubleshoot, and resolve issues related to system reliability.RESPONSIBILITIES:Take ownership of customer-reported issues and see problems through to resolution.Develop preventive measures...


  • Old Toronto, Canada RBC Full time

    About the RoleWe are seeking an experienced Senior Site Reliability Engineer to join our US Cash Management Technology team at RBC. As a key member of our team, you will be responsible for leading the development, implementation, and support of Site Reliability Engineering (SRE) solutions for applications supported by the Commercial, Core Banking, and...