Digital Site Reliability Engineer
4 weeks ago
- Designing, building, and scaling Okta's production Kubernetes platform
- Being an evangelist for security best practices and leading initiatives/projects to strengthen our security posture for critical infrastructure
- Responding to production incidents and determining how we can prevent them in the future
- Triaging and troubleshooting complex production issues to ensure reliability and performance
- Continuously evolving our monitoring tools and platform
- Developing and maintaining technical documentation, runbooks, and procedures
- Supporting a 24x7 online environment as part of an on-call rotation
- Are always willing to go the extra mile: see a problem, fix the problem.
- Are passionate about encouraging the development of engineering peers and leading by example.
- A proven track record of successful SRE engagements and collaborating closely with engineering teams.
- Knowledge and experience with deploying microservices and utilizing CI/CD pipelines.
- A security mindset that prioritizes protecting assets from risks and vulnerabilities.
- 6+ years of experience with AWS and Terraform
- 3+ years of experience provisioning and managing Kubernetes clusters, with solid understanding of containers, Kubernetes infrastructure, and helm charts.
- 3+ years of developer experience with Python or Golang
- Strong Linux understanding and experience
- Experience with Istio service mesh and network policies
- Familiarity with Spinnaker
- Experience with monitoring and alerting in a Kubernetes ecosystem
- Certified Kubernetes Administrator (CKA) or Certified Kubernetes Application Developer (CKAD) certification
#LI-Remote
#LI-MM
Below is the annual salary range for candidates located in Canada. In addition, Okta offers equity (where applicable), bonus, and benefits, including health, dental, and vision insurance, RRSP with a match, healthcare spending, telemedicine, and paid leave (including PTO and parental leave) in accordance with our applicable plans and policies.
-
Digital Site Reliability Engineer
3 months ago
Old Toronto, Canada Mastech Inc. Full timeMastech Digital is an IT Staffing and Digital Transformation Services company.Mastech Digital provides digital and mainstream technology staff as well as Digital Transformation Services for all American Corporations. We are currently seeking a Site Reliability Engineer (GCP) for our client in the Consulting domain. We value our professionals, providing...
-
AWS Site Reliability Engineer
4 weeks ago
Old Toronto, Canada Tecsys Inc. Full timep>Having recognized the advantages of remote work, including employee morale, productivity, reduced commuting on employee wellbeing and the environment, we are proud to be a digital-first company. The technologies and programs in which we invested have provided a fantastic foundation to this end. Our digital-first work environment, together with our...
-
AWS Site Reliability Engineer
2 months ago
Old Toronto, Canada Tecsys Full timep>Having recognized the advantages of remote work, including employee morale, productivity, reduced commuting on employee wellbeing and the environment, we are proud to be a digital-first company. The technologies and programs in which we invested have provided a fantastic foundation to this end. Our digital-first work environment, together with our...
-
Site Reliability Engineering Lead
4 weeks ago
Old Toronto, Canada TD Full timeJob OverviewWe are seeking a highly skilled Site Reliability Engineering Lead to join our team at TD. As a key member of our technology group, you will be responsible for ensuring the stability, scalability, and reliability of our platforms.About the RoleThe ideal candidate will have a minimum of 8 years of experience in site reliability engineering, with a...
-
Digital Innovation Specialist
4 weeks ago
Old Toronto, Canada Loblaw Digital Full timeWe're shaping the future of e-commerce at Loblaw Digital, a pioneering team that crafts exceptional online experiences. To achieve our goals, we seek talented and passionate individuals who want to collaborate and solve complex problems, making a lasting impact on Canadians.About the RoleThis position offers an exciting opportunity for you to be part of our...
-
AWS Site Reliability Engineer
2 months ago
Old Toronto, Canada Street Context Full timep>Are you a Site Reliability Engineer that has a passion for building reliable, resilient and performant systems that scale? p>We are on a mission to build and strengthen our engineering teams to match the accelerating success of Street Context. We provide a premium Email, Analytics and Broker Relationship platform, purpose-built for capital markets and...
-
AWS Site Reliability Engineer
2 months ago
Old Toronto, Canada Soda Full timeJob Description Job Title: Site Reliability Engineer Location: Poland - Fully Remote Salary: 324K PLN or 27.3K monthly Start: ASAP Stack: AWS, Docker, Kubernetes, Terraform, Jenkins, Ansible, Linux, JavaScript, and Lambda. Are you a seasoned DevOps/SRE professional passionate about building high-performance, scalable systems? I am working with a Media/IT...
-
AWS Site Reliability Engineer
3 months ago
Old Toronto, Canada Sentry Full timeBad software is everywhere, and we’re tired of it. Sentry is on a mission to help developers write better software faster, so we can get back to enjoying technology.With more than $217 million in funding and 100,000+ organizations that believe we’re on to something, we're building performance and error monitoring tools that help companies like Disney,...
-
AWS Site Reliability Engineer
3 months ago
Old Toronto, Canada Sentry Full timep>The Site Reliability Engineering team is responsible for the deployment, configuration, maintenance, and monitoring of Sentry's hosted platform. We do this by leveraging automation tools to automatically spin up and scale services to meet the traffic demands of 1,000,000+ developers. Sentry receives over a billion events a day and processes terabytes of...
-
AWS Site Reliability Engineer
3 months ago
Old Toronto, Canada Sentry Full timeBad software is everywhere, and we’re tired of it. Sentry is on a mission to help developers write better software faster, so we can get back to enjoying technology.With more than $217 million in funding and 100,000+ organizations that believe we’re on to something, we're building performance and error monitoring tools that help companies like Disney,...
-
AWS Site Reliability Engineer
2 months ago
Old Toronto, Canada Olx Full timep>Site Reliability EngineerRemote Poland, PolandOLX – Engineering / Full-time / Remote At OLX, we work together to build a more sustainable world through trade. We make it safe, smart, and convenient to buy and sell cars, find housing, get jobs, buy and sell household goods, and more. Our colleagues around the world help to serve millions of people around...
-
Digital Reliability Engineer
4 weeks ago
Toronto, Ontario, Canada Royal Bank of Canada Full timeJob Summary">We are seeking a highly motivated Technical Release Coordinator to join our Digital SRE Environment and Release team. This role offers the unique opportunity to work at the intersection of technology, reliability, and delivery, ensuring the smooth execution of technical projects that directly impact our digital infrastructure and release...
-
Site Reliability Engineering Linux or Windows
3 months ago
Old Toronto, Canada Thomson Reuters Full timeh3>(Canada) Site Reliability Engineer (Contract)Contract (9 months 4 days)Published 3 days agoNew RelicData DogSite Reliability Engineer - in the Service Management OrganizationDo you have experience in IT Service Management, working with cloud providers, software development, and technology infrastructure?The Site Reliability Engineer will analyze chronic...
-
Data Engineer
4 months ago
Old Toronto, Canada Apply Digital Ltd. Full timediv>Who we are: We’re a global digital transformation partner for change agents. p>What we do: We empower enterprises to shift to evolving business opportunities, gain powerful insights and deliver experiences that drive growth.Who we help: Our 600+ digital specialists have helped global companies like Kraft Heinz, Moderna, Tigo, Atlassian, The Very Group...
-
Site Reliability Engineering Manager
3 weeks ago
Old Toronto, Canada Tbwa ChiatDay Inc Full timeAutomate and Optimize Brick and Mortar RetailFocal Systems is the industry leader in retail AI solutions, revolutionizing brick and mortar retail with deep learning computer vision. As a Silicon Valley-based startup, we have more than doubled in size every year since inception.Our MissionWe are looking for smart, creative, and passionate individuals who want...
-
Digital Solutions Strategist
3 weeks ago
Old Toronto, Canada Digital Associates Full timeRole SummaryWe are seeking an experienced Digital Solutions Strategist to lead our digital initiatives and oversee the development of cutting-edge software solutions.About the RoleThis is a senior leadership position that requires a strategic leader with a passion for digital innovation and a proven track record in delivering transformative software...
-
Digital Reliability Expert
2 weeks ago
Old Toronto, Canada Akamai Full timeAbout the RoleAkamai is seeking a highly skilled Digital Reliability Expert to join our team. This role will involve designing, developing, and managing applications and infrastructure that support Akamai's Compute products and services.The successful candidate will collaborate with operations and development teams to create tooling and software that...
-
AWS Site Reliability Engineer
4 weeks ago
Old Toronto, Canada Tecsys Full timeTecsys is a fast-growing innovator offering supply chain solutions to industry-leading healthcare systems, hospitals, and pharmacy businesses to distributors, retailers, and 3PLs. As a Cloud Infrastructure Specialist, you will be responsible for ensuring the reliability and uptime of our platform and applications in a data-driven way to support internal and...
-
Site Reliability Engineer- Automation
3 months ago
Old Toronto, Canada Ascend Fundraising Solutions Full timeWe are currently seeking a full-time Site Reliability Engineer to join our IT team. In this role, you will collaborate closely with the client services team to diagnose, troubleshoot, and resolve issues related to system reliability.RESPONSIBILITIES:Take ownership of customer-reported issues and see problems through to resolution.Develop preventive measures...
-
Senior Site Reliability Engineer
4 weeks ago
Old Toronto, Canada RBC Full timeAbout the RoleWe are seeking an experienced Senior Site Reliability Engineer to join our US Cash Management Technology team at RBC. As a key member of our team, you will be responsible for leading the development, implementation, and support of Site Reliability Engineering (SRE) solutions for applications supported by the Commercial, Core Banking, and...