Site Reliability Engineer

3 weeks ago


Toronto, Canada Kyndryl Full time

Join to apply for the Site Reliability Engineer role at Kyndryl. Direct message the job poster from Kyndryl. Recruitment & Strategic Staffing @Kyndryl | Partnering with IT Consultants in Financial Services & Technology Position: Site Reliability Engineer Client: Financial Services - Capital Markets Technology Duration: 12-month contract with potential extensions Location: Toronto, Canada - 2 to 3 days onsite per week Language: English Hours: 37.5 hours/week Our client is looking for a Site Reliability Engineer (SRE) to enhance the reliability, performance, and efficiency of mission‑critical batch workloads across Capital Markets Technology. The SRE will serve as a technical lead focused on automation, application development, systems performance engineering, and observability using Dynatrace. This position is pivotal in driving operational excellence and maturing reliability practices across the organization. Qualifications Expert‑level Python skills, including performance tuning, concurrency (async/multiprocessing), testing, and packaging. Strong Linux systems engineering expertise (kernel tuning, networking, process management, filesystem optimization). Proven experience optimizing batch workloads for performance, reliability, and cost efficiency. Deep knowledge of Dynatrace for observability (dashboards, KPIs, tagging, alerts, anomaly detection). Hands‑on experience with Apache Airflow (DAG design, scheduler tuning, SLA management). Strong understanding of distributed systems concepts — retries, idempotency, backpressure, data integrity. Experience with CI/CD pipelines (GitHub Actions, Azure DevOps, Jenkins) and Infrastructure as Code (Terraform, Ansible). Familiarity with containers and orchestration tools (Docker, Kubernetes). Excellent incident management, troubleshooting, and communication skills. Responsibilities Reliability & Performance: Engineer resilient and performant batch processing pipelines by reducing runtime and minimizing failures. Observability: Implement and maintain Dynatrace dashboards, alerts, and runbooks to ensure deep visibility into system health. Systems Engineering: Configure and tune Linux and Windows environments for optimal reliability and speed. Automation & Orchestration: Design and refine Airflow DAGs, automate deployments with CI/CD pipelines, and reduce operational toil through code. Incident Management: Lead incident response, conduct root‑cause analysis, and implement improvements based on post‑mortems and SLOs. Security & Compliance: Ensure all reliability and automation processes adhere to security best practices and regulatory compliance standards. Please note this is for a contract position with one of our clients and not a full-time employment role with Kyndryl Canada. Seniority level Mid‑Senior level Employment type Contract Job function Information Technology Industries IT Services and IT Consulting Referrals increase your chances of interviewing at Kyndryl by 2x. Sign in to set job alerts for “Site Reliability Engineer” roles. #J-18808-Ljbffr



  • (s): Canada : Ontario : Toronto Scotiabank Global Site Full time

    Requisition ID: 244026Join a purpose driven winning team, committed to results, in an inclusive and high-performing culture.Overview: As a Site Reliability Engineer (SRE), you will join the Digital Engineering Operations team, responsible for ensuring the operations and reliability of Scotiabank digital applications. You will have the opportunity to drive...


  • (s): Canada : Ontario : Toronto Scotiabank Global Site Full time

    Requisition ID: 244027Join a purpose driven winning team, committed to results, in an inclusive and high-performing culture.Overview: As a Site Reliability Engineer (SRE), you will join the Digital Engineering Operations team, responsible for ensuring the operations and reliability of Scotiabank digital applications. You will have the opportunity to drive...


  • Toronto, Ontario, Canada Procom Full time

    Site Reliability Engineer (SRE)/ Ingénieur Fiabilité des SitesOn behalf of our banking client, Procom is seeking a Site Reliability Engineer (SRE) for a 12-month contract role. This position is a hybrid role, 3 days a week onsite at our client's Montréal, Quebec office.Site Reliability Engineer - Job Description:The Site Reliability Engineer is...


  • Toronto, Canada Kyndryl Full time

    Join to apply for the Site Reliability Engineer role at Kyndryl. Direct message the job poster from Kyndryl. Recruitment & Strategic Staffing @Kyndryl | Partnering with IT Consultants in Financial Services & Technology Position: Site Reliability Engineer Client: Financial Services - Capital Markets Technology Duration: 12-month contract with potential...


  • Toronto, Canada Denvr Full time

    Site Reliability Engineer - Platform Infrastructure Team (100% Remote - Canada) Denvr is a vertically integrated AI Platform Services company headquartered in Calgary, Canada. We provide foundational compute infrastructure and services to support the broader AI ecosystem and its end users. The platform includes cloud‑native solutions for training,...


  • Toronto, Canada Denvr Full time

    Site Reliability Engineer - Platform Infrastructure Team (100% Remote - Canada) Denvr is a vertically integrated AI Platform Services company headquartered in Calgary, Canada. We provide foundational compute infrastructure and services to support the broader AI ecosystem and its end users. The platform includes cloud‑native solutions for training,...


  • Toronto, Canada Tecsys Inc. Full time

    Get AI-powered advice on this job and more exclusive features. Having recognized the advantages of remote work, including employee morale, productivity, reduced commuting on employee wellbeing and the environment, we are proud to be a digital-first company. The technologies and programs in which we invested have provided a fantastic foundation to this end....


  • Toronto, Canada Tecsys Inc. Full time

    Having recognized the advantages of remote work, including employee morale, productivity, reduced commuting on employee wellbeing and the environment, we are proud to be a digital-first company. The technologies and programs in which we invested have provided a fantastic foundation to this end. Our digital-first work environment, together with our...


  • Toronto, Canada Tecsys Inc. Full time

    Having recognized the advantages of remote work, including employee morale, productivity, reduced commuting on employee wellbeing and the environment, we are proud to be a digital-first company. The technologies and programs in which we invested have provided a fantastic foundation to this end. Our digital-first work environment, together with our...


  • Toronto, Canada Moneris Full time

    Your Moneris Career - The Opportunity We are looking for a Site Reliability Engineer (SRE) to join our dynamic team. As an SRE, you will help ensure the reliability, performance, and scalability of our systems. You will work with development and operations teams to build and maintain robust infrastructure, automate processes, and improve overall system...