Lead Site Reliability Engineer

2 weeks ago


Toronto, Ontario, Canada AceStack Full time

Job Title: Lead Site Reliability Engineer – Banking Domain (Wealth Management Preferred)

Location: Toronto Downtown, ON (Onsite – 5 Days/Week)

Duration: Contract

Experience: 14+ Years

About the Role:

We are looking for a highly skilled Site Reliability Engineering (SRE) Lead with a strong background in the Banking domain, ideally within Wealth Management. The ideal candidate will lead the SRE function to ensure system reliability, scalability, and performance across mission-critical financial applications. This role involves hands-on technical expertise combined with leadership responsibilities to drive service excellence and operational efficiency.

Key Responsibilities:

· Lead and mentor a team of SREs responsible for production stability, reliability, and availability of banking and wealth management systems.

· Design and implement monitoring, alerting, and incident response strategies to proactively manage system health.

· Collaborate with development and infrastructure teams to drive DevOps and automation initiatives, ensuring smooth CI/CD pipelines.

· Define and implement SLIs, SLOs, and SLAs to measure and improve service performance.

· Manage and drive incident management, root cause analysis (RCA), and problem resolution to ensure minimal downtime and business impact.

· Lead capacity planning, performance tuning, and disaster recovery strategies.

· Drive observability and resilience engineering best practices across all platforms.

· Work closely with stakeholders in banking and wealth management domains to align reliability goals with business needs.

· Establish governance processes and ensure compliance with financial regulatory and security standards.

· Develop dashboards and reporting metrics to provide visibility into system performance and reliability.

· Champion a culture of continuous improvement, automation, and reliability-first mindset.

Required Skills & Experience:

· 10+ years of total IT experience, with at least 4+ years in Site Reliability Engineering or Production Operations leadership roles.

· Strong domain experience in Banking, with exposure to Wealth Management systems (highly desirable).

· Expertise in Linux/Unix administration, networking, and cloud infrastructure (AWS, Azure, or GCP).

· Strong scripting and automation experience (Python, Shell, or similar).

· Proficiency in monitoring and observability tools such as Prometheus, Grafana, Splunk, ELK, AppDynamics, or Dynatrace.

· Experience with CI/CD pipelines, Git, Jenkins, Ansible, Terraform, or equivalent tools.

· In-depth understanding of incident, problem, and change management based on ITIL principles.

· Proven track record in managing production systems supporting large-scale, high-availability financial applications.

· Excellent communication, stakeholder management, and team leadership skills.



  • Toronto, Ontario, Canada Aarorn Technologies Inc Full time

    Job Title: Site-Reliability Engineer (SRE)Location: Toronto, ON (3x onsite a week)Employment Type: ContractJob DescriptionWe are seeking a highly skilled Site Reliability Engineer (SRE) to enhance the reliability, performance, and efficiency of mission-critical batch workloads within Capital Markets Technology. In this role, you will serve as the technical...


  • Toronto, Ontario, Canada Procom Full time

    Site Reliability Engineer (SRE)/ Ingénieur Fiabilité des SitesOn behalf of our banking client, Procom is seeking a Site Reliability Engineer (SRE) for a 12-month contract role. This position is a hybrid role, 3 days a week onsite at our client's Montréal, Quebec office.Site Reliability Engineer - Job Description:The Site Reliability Engineer is...


  • Toronto, Ontario, Canada Tecsys Inc. Full time

    Having recognized the advantages of remote work, including employee morale, productivity, reduced commuting on employee wellbeing and the environment, we are proud to be a digital-first company. The technologies and programs in which we invested have provided a fantastic foundation to this end. Our digital-first work environment, together with our...


  • Toronto, Ontario, Canada FactSet Full time

    FactSet creates flexible, open data and software solutions for over 200,000 investment professionals worldwide, providing instant access to financial data and analytics that investors use to make crucial decisions.At FactSet, our values are the foundation of everything we do. They express how we act and operate, serve as a compass in our decision-making, and...


  • Toronto, Ontario, Canada Pixomondo Full time

    We're seeking an experienced Site Reliability Engineer to join our team and lead infrastructure automation, CI/CD workflows, and deployment operations for a custom web platform. You'll be working with a modern DevOps stack including GitHub Actions, GCP, Kubernetes, Terraform, PostgreSQL, CodeDeploy, and Cloudflare to ensure our platform is robust, scalable,...


  • Toronto, Ontario, Canada Scotiabank Full time

    Requisition ID: 244027Join a purpose driven winning team, committed to results, in an inclusive and high-performing culture.Overview:As a Site Reliability Engineer (SRE), you will join the Digital Engineering Operations team, responsible for ensuring the operations and reliability of Scotiabank digital applications. You will have the opportunity to drive the...


  • Toronto, Ontario, Canada Scotiabank Full time

    Requisition ID: 244026Join a purpose driven winning team, committed to results, in an inclusive and high-performing culture.Overview:As a Site Reliability Engineer (SRE), you will join the Digital Engineering Operations team, responsible for ensuring the operations and reliability of Scotiabank digital applications. You will have the opportunity to drive the...


  • Toronto, Ontario, Canada Kablamo Full time

    Reports to: Technical Support ManagerLocation: Toronto (Hybrid)Role Type: Full timeLevel: Intermediate/MidIntroductionKablamo is a fast-growing cloud digital product development company. Founded in 2017 in Australia, the business has grown quickly over the last several years, including the expansion of the team to Canada in 2021. We are proud to have...


  • Toronto, Ontario, Canada Xplor Full time $125,000 - $150,000

    Company Description Take a seat on the Xplor rocketship and join us as Site Reliability Engineer to help people succeed across the world.From dropping your kids off at childcare, getting something at home repaired, going to the gym or a fitness studio, to picking up your dry cleaning — our software, payments, and commerce-enabling solutions help everyday...


  • Toronto, Ontario, Canada Autodesk Full time

    Job Requisition ID #25WD92369Position OverviewWe are seeking a highly motivated and experienced Senior Site Reliability Engineer (SRE) to manage critical cloudinfrastructure and site reliability operations for Autodesk's global Product Access journey. This pivotal role focuses on ensuringthe highest reliability, availability, and performance of our...