Site Reliability Engineer
1 month ago
Who We Are
At Kyndryl, we design, build, manage and modernize the mission-critical technology systems that the world depends on every day. So why work at Kyndryl? We are always moving forward – always pushing ourselves to go further in our efforts to build a more equitable, inclusive world for our employees, our customers and our communities.
The Role
Skytap, recently acquired by Kyndryl, a product/SaaS based company is looking for a Site Reliability Engineer to support Skytap''s platform. This Networking Team is responsible for the design, implementation, and operation of software-defined networking technology that is at the core of Skytap’s product.
Our SDN technology provides customers Layer 2 through Layer 7 networking features for traditional applications they’ve migrated from on-premises to Skytap Cloud. This includes features such as network isolation, MAC and IP address management and translation, policy-based routing, and hybrid-cloud connectivity. Customers can create identical clones of their virtual data centers without any L2 or L3 modifications and can connect them to one another and to on-prem resources. This Networking team makes this magic possible at scale in data centers across the world.
We hire talented engineers who all work together to foster a great engineering environment. In your role as a Site Reliability Engineer, you’ll use your skills to help instrument our systems so they can be easily built, observed, monitored, tested, and deployed at scale, and ensure Skytap’s services perform well for enterprise customers. One of your primary responsibilities will be to ensure the reliability, scalability, and security of our systems and services. You will work closely with development, operations, and security teams to design, implement, and maintain automated solutions that enhance the stability and security of our infrastructure. You will also participate in incident response activities, including incident triage, root cause analysis, and post-mortem reviews.
In order to be effective in this role as a Site Reliability Engineer, you’ll need to have proficiency with general DevOps and automation and knowledge of Linux/Unix-based operating systems. Previous experience with networking technologies is a bonus, but not required and you will have opportunities for exposure and learning. You can expect to spend half of your time writing code: usually provisioning and monitoring automation improvements, bug fixes, and internal technical improvements.
Your Responsibilities:
Design and add new monitoring, logging, alerting, and metrics to systems
Eventually, contribute to the team's on-call rotation
General system operations work
Improve configuration management systems and automation
Improve processes and documentation for service administration
Write design documentation for major service improvements
Develop, maintain, improve, and automate the build and testing pipeline
Ensure the software release process is operating smoothly and effectively
Incorporate new software into package management repos when needed
Implement security controls and hardening measures to mitigate risks and enhance the security posture of our systems.
As your domain knowledge increase over time, you can take on these additional responsibilities such as assisting in field requests and customer-facing troubleshooting and diagnostics, including working with the Support team.
Your Future at Kyndryl
Kyndryl has a global footprint, which means that as a Site Reliability Engineer at Kyndryl you will have opportunities to work on projects and collaborate with colleagues from around the world. This role is dynamic and influential – offering a wide range of professional and personal growth opportunities that you won’t find anywhere else.
Who You Are
Your Skills & Expertise:
3 years of experience with infrastructure and configuration management tools like Ansible, Puppet, and Terraform.
Core networking domain technology knowledge & experience such as TCP/IP, DNS, DHCP, TLS, and network virtualization.
Understand that your success is measured by the success of our service’s reliability and performance.
Experience with time series databases and data visualization tools such as the TICK Stack. (Telegraf, InfluxDB, Chronograf and Kapacitor).
Experience with logging, search, and visualization tools such as the Elastic Stack (Elasticsearch, Logstash, Kibana) and Grafana
General networking protocol stack knowledge.
Experience with container orchestration tools such as Docker and Kubernetes.
Solid understanding of Linux/Unix-based operating systems and experience in debugging system and networking issues.
Knowledge of Linux kernel internals and tunable.
Understanding of service level objectives and service level agreements.
Have experience creating and scaling highly available distributed systems.
Intermediate programming experience with languages like Python and Bash and experience with source code control tools and platforms such as Git and GitHub.
Ability to dig into the details of projects or write scripts to uncover patterns from sources of data.
Ability to remain calm and effective in high-stress settings such as interpersonal conflicts, technical discussions, and production outages
Detail-oriented reader. You can read a spec and see the big picture as well as missing edge cases.
Other required skills include strong communication skills a collaborative attitude and a working style.
Bonus Skills:
Experience in high-performance networking architecture and operational troubleshooting of network issues is a bonus.
Experience with network packet generating and analyzing tools such as Scapy, tcpdump, Wireshark/TShark, etc.
Experience with cloud platforms like Azure, AWS, etc.
Zabbix
MySQL
Preferred Skills and Experience
•BS degree in Computer Science, Engineering, or other highly technical, scientific discipline
•Expertise with Ansible, Terraform, and Python
•Experience with distributed technologies as well as dynamic resource management frameworks such as Kubernetes
•Expertise in leveraging open-source tooling such as Prometheus, Grafana, or Loki
Being You
Diversity is a whole lot more than what we look like or where we come from, it’s how we think and who we are. We welcome people of all cultures, backgrounds, and experiences. But we’re not doing it single-handily: Our Kyndryl Inclusion Networks are only one of many ways we create a workplace where all Kyndryls can find and provide support and advice. This dedication to welcoming everyone into our company means that Kyndryl gives you – and everyone next to you – the ability to bring your whole self to work, individually and collectively, and support the activation of our equitable culture. That’s the Kyndryl Way.
What You Can Expect
With state-of-the-art resources and Fortune 100 clients, every day is an opportunity to innovate, build new capabilities, new relationships, new processes, and new value. Kyndryl cares about your well-being and prides itself on offering benefits that give you choice, reflect the diversity of our employees and support you and your family through the moments that matter – wherever you are in your life journey. Our employee learning programs give you access to the best learning in the industry to receive certifications, including Microsoft, Google, Amazon, Skillsoft, and many more. Through our company-wide volunteering and giving platform, you can donate, start fundraisers, volunteer, and search over 2 million non-profit organizations. At Kyndryl, we invest heavily in you, we want you to succeed so that together, we will all succeed.
-
Site Reliability Engineer
7 months ago
Toronto, Canada CB Canada Full timeSite Reliability Engineer On behalf of our client in the Banking Sector, PROCOM is looking for a Site Reliability Engineer. Site Reliability Engineer – Job Description Azure cloud Jira and confluence CICD Experience with automating (provisioning, configuration management, deployment) and integrating Azure PaaS solutions (Azure App services, Azure...
-
Site Reliability Engineering Lead
3 weeks ago
Old Toronto, Canada TD Full timeJob OverviewWe are seeking a highly skilled Site Reliability Engineering Lead to join our team at TD. As a key member of our technology group, you will be responsible for ensuring the stability, scalability, and reliability of our platforms.About the RoleThe ideal candidate will have a minimum of 8 years of experience in site reliability engineering, with a...
-
Site Reliability Engineer
4 weeks ago
Toronto, Canada PointsBet Canada Full timeSITE RELIABILITY ENGINEER ABOUT THE ROLEAs a Site Reliability Engineer (SRE), you will ensure the reliability, scalability, and performance of our product. You will lead efforts in proactive monitoring, incident management, automation, collaborating across teams to implement best practices in reliability engineering. Your expertise will drive resilient...
-
Site Reliability Engineer
4 weeks ago
Toronto, Canada PointsBet Canada Full timeSITE RELIABILITY ENGINEER ABOUT THE ROLEAs a Site Reliability Engineer (SRE), you will ensure the reliability, scalability, and performance of our product. You will lead efforts in proactive monitoring, incident management, automation, collaborating across teams to implement best practices in reliability engineering. Your expertise will drive resilient...
-
Site Reliability Engineering Leader
4 weeks ago
Toronto, Ontario, Canada Royal Bank of Canada Full timeRoyal Bank of Canada is seeking a highly skilled Site Reliability Engineering (SRE) leader to join our team in Toronto, Canada. As an SRE leader, you will be responsible for leading the development and implementation of SRE solutions that improve the reliability and performance of our applications.The ideal candidate will have 5+ years of experience as a...
-
Senior Site Reliability Engineer
4 months ago
Toronto, Canada Northbridge Financial Corporation Full timeWhat is it like to be a Senior Site Reliability Engineer at Northbridge Financial The Senior Site Reliability Engineer oversees the creation and implementation of Service Level Objectives (SLOs). The Senior SRE handles service reliability solutions and processes of increasing complexity, and are responsible for mentoring and leading less experienced...
-
Site Reliability Engineer
3 months ago
Toronto, Canada SGS Full timeJob Description The Site Reliability Engineer will play a critical part in ensuring the reliability, supportability, scalability, and performance of our .NET stack applications built with MVC, Angular, and Web API. Partner with developers and product operations teams to understand application requirements and translate them into operational practices....
-
AWS Site Reliability Engineer
2 months ago
Old Toronto, Canada Street Context Full timep>Are you a Site Reliability Engineer that has a passion for building reliable, resilient and performant systems that scale? p>We are on a mission to build and strengthen our engineering teams to match the accelerating success of Street Context. We provide a premium Email, Analytics and Broker Relationship platform, purpose-built for capital markets and...
-
AWS Site Reliability Engineer
2 months ago
Old Toronto, Canada Soda Full timeJob Description Job Title: Site Reliability Engineer Location: Poland - Fully Remote Salary: 324K PLN or 27.3K monthly Start: ASAP Stack: AWS, Docker, Kubernetes, Terraform, Jenkins, Ansible, Linux, JavaScript, and Lambda. Are you a seasoned DevOps/SRE professional passionate about building high-performance, scalable systems? I am working with a Media/IT...
-
Site Reliability Engineer
4 weeks ago
Toronto, ON, Canada PointsBet Canada Full timeSITE RELIABILITY ENGINEER As a Site Reliability Engineer (SRE) , you will ensure the reliability, scalability, and performance of our product. You will lead efforts in proactive monitoring, incident management, automation, collaborating across teams to implement best practices in reliability engineering. PointsBet is a sports & casino betting operator...
-
AWS Site Reliability Engineer
3 months ago
Old Toronto, Canada Sentry Full timep>The Site Reliability Engineering team is responsible for the deployment, configuration, maintenance, and monitoring of Sentry's hosted platform. We do this by leveraging automation tools to automatically spin up and scale services to meet the traffic demands of 1,000,000+ developers. Sentry receives over a billion events a day and processes terabytes of...
-
Site Reliability Engineer
4 weeks ago
Toronto, Ontario, Ontario, Canada PointsBet Canada Full timeSITE RELIABILITY ENGINEER ABOUT THE ROLEAs a Site Reliability Engineer (SRE), you will ensure the reliability, scalability, and performance of our product. You will lead efforts in proactive monitoring, incident management, automation, collaborating across teams to implement best practices in reliability engineering. Your expertise will drive resilient...
-
Site Reliability Engineer
4 weeks ago
Toronto, ON, Canada PointsBet Canada Full timeSITE RELIABILITY ENGINEER ABOUT THE ROLE As a Site Reliability Engineer (SRE) , you will ensure the reliability, scalability, and performance of our product. You will lead efforts in proactive monitoring, incident management, automation, collaborating across teams to implement best practices in reliability engineering. Your expertise will drive resilient...
-
Site Reliability Engineer
4 weeks ago
Toronto, ON, Canada PointsBet Canada Full timeSITE RELIABILITY ENGINEER ABOUT THE ROLE As a Site Reliability Engineer (SRE) , you will ensure the reliability, scalability, and performance of our product. You will lead efforts in proactive monitoring, incident management, automation, collaborating across teams to implement best practices in reliability engineering. Your expertise will drive resilient...
-
Site Reliability Engineer
4 weeks ago
Toronto, Canada Teranet Inc. Full timeSite Reliability Engineer Who We AreTeranet is Canada’s leader in the delivery and transformation of statutory registry services with extensive expertise in land and commercial registries. We also market insightful property and data solutions, as well as practice management automation to thousands of customers in the real estate, financial services,...
-
Site Reliability Engineer
4 weeks ago
Toronto, Canada Teranet Inc. Full timeSite Reliability Engineer Who We AreTeranet is Canada’s leader in the delivery and transformation of statutory registry services with extensive expertise in land and commercial registries. We also market insightful property and data solutions, as well as practice management automation to thousands of customers in the real estate, financial services,...
-
AWS Site Reliability Engineer
2 months ago
Old Toronto, Canada Olx Full timep>Site Reliability EngineerRemote Poland, PolandOLX – Engineering / Full-time / Remote At OLX, we work together to build a more sustainable world through trade. We make it safe, smart, and convenient to buy and sell cars, find housing, get jobs, buy and sell household goods, and more. Our colleagues around the world help to serve millions of people around...
-
Site Reliability Engineering Linux or Windows
3 months ago
Old Toronto, Canada Thomson Reuters Full timeh3>(Canada) Site Reliability Engineer (Contract)Contract (9 months 4 days)Published 3 days agoNew RelicData DogSite Reliability Engineer - in the Service Management OrganizationDo you have experience in IT Service Management, working with cloud providers, software development, and technology infrastructure?The Site Reliability Engineer will analyze chronic...
-
Digital Site Reliability Engineer
3 months ago
Old Toronto, Canada Mastech Inc. Full timeMastech Digital is an IT Staffing and Digital Transformation Services company.Mastech Digital provides digital and mainstream technology staff as well as Digital Transformation Services for all American Corporations. We are currently seeking a Site Reliability Engineer (GCP) for our client in the Consulting domain. We value our professionals, providing...
-
Senior Site Reliability Engineer
6 months ago
Greater Toronto Area, Canada GlossGenius Full timeAbout GlossGenius GlossGenius is building an ecosystem enabling entrepreneurs to succeed. We empower small business owners to focus on being creators, not admins, by offering a range of business management tools including booking and scheduling, marketing, analytics, payment processing and much more. Over 75,000 small business owners have chosen to...