Lead Site Reliability Engineer

3 days ago

Toronto, Ontario, Canada FactSet Full time

FactSet creates flexible, open data and software solutions for over 200,000 investment professionals worldwide, providing instant access to financial data and analytics that investors use to make crucial decisions.

At FactSet, our values are the foundation of everything we do. They express how we act and operate, serve as a compass in our decision-making, and play a big role in how we treat each other, our clients, and our communities. We believe that the best ideas can come from anyone, anywhere, at any time, and that curiosity is the key to anticipating our clients' needs and exceeding their expectations.

About Irwin
Irwin, a FactSet Company, is a leading provider of capital markets-focused financial technology with a mission to seamlessly connect the world's capital seekers and allocators to make them more productive, innovative, and successful. Our flagship product, Irwin, is a software platform used by investor relations and investment banking professionals all over the world. In October 2024, Irwin joined FactSet and added its investor relations solution to its existing offering.

Role Overview
We are seeking a seasoned Senior Site Reliability Engineer with deep expertise in AWS to own, architect, and continuously evolve Irwin's core infrastructure. You will plan, build, and optimize the systems that support our web applications and internal tools, ensuring scalability, reliability, observability, and security. Your technical judgment, roadmap planning skills, and hands-on expertise will enable our engineering teams to ship features with velocity and confidence.

Key Responsibilities

Strategic Road mapping:

Design and execute long-term strategies for scalable, secure infrastructure to host the Irwin web application and associate tooling on AWS/EKS with PostgreSQL.

AWS Infrastructure:

Architect and manage highly available cloud environments on EKS/Kubernetes using best practices for cost, performance, and security.

Database Operations:

Oversee, tune, and ensure the high availability of large-scale PostgreSQL databases; optimize for performance, backup, disaster recovery, and observability. Bonus points for experience using Snowflake or other OLAP systems.

Infrastructure as Code:

Lead the adoption and maintenance of Terraform workflows to manage infrastructure; ensure reproducibility, modularity, and CI/CD integration.

Continuous Integration & GitOps:

Build, maintain and scale CI/CD pipelines using GitOps principles to automate deployments, reduce risk, and speed up delivery cycles.

Kubernetes:

Design, deploy, and manage production-grade Kubernetes clusters; automate scaling and implement robust security practices.

Monitoring & Incident Response:

Implement monitoring, logging, and alerting solutions; establish best practices for incident detection and resolution.

Security & Compliance:

Apply industry best practices for infrastructure and data security; ensure governance and compliance with relevant standards (e.g., SOC2, GDPR).

Collaboration:

Mentor SRE peers and engineering teams on DevOps/SRE methodologies; document, communicate, and evangelize infrastructure best practices.

Required Skills And Experience
Minimum Requirements:

10+ years as a Site Reliability Engineer, DevOps, or similar role in cloud-native environments (AWS focus).

Critical Skills

Deep technical proficiency with AWS services (EC2, EKS, S3, RDS, IAM, etc.).
Expert-level experience managing, tuning, and scaling PostgreSQL databases.
Advanced skill in Terraform (modular design, environment promotion, CI/CD integration).
Proficient in building and operating CI/CD systems (Gitlab CI, GitHub Actions, or equivalent).
Hands-on experience with GitOps workflows (Argo CD, Flux, etc.).
Strong knowledge of Kubernetes (deployment, scaling, networking, security).
Experience with monitoring and logging stacks (DataDog, Prometheus, Grafana, ELK, etc.).
Track record in designing, communicating, and executing complex infrastructure roadmaps.
Experience mentoring and enabling engineering teams.
Strong written and verbal communication skills.

Preferred/Desired Qualifications

Professional certifications (AWS Solutions Architect, Kubernetes, Terraform).
Experience in fin-tech, SaaS, or high-compliance industries.
Exposure to data privacy regulations and secure software development practices.

Education

Bachelors degree in computer science or similar

Why Irwin?

Influence the technology roadmap at a pivotal growth stage.
Build infrastructure for mission-critical applications.
Work with passionate, high-performing teams.
Competitive compensation, benefits, and equity options.

Here Is What To Expect

First interview with me (hiring manager) to assess experience, basic technical skills and cultural fit - 1 hour
Second deeper technical interview with two team member for a deeper technical assessment - 1 hour
Optional third technical interview also with the same two team members if they are not able to cover everything in the first hour
Last 30 minute interview with hiring manager to answer any remaining questions before we proceed to offer.

When You Apply
Please attach your resume and a cover letter describing your approach to architecting scalable infrastructure and your experience with AWS, PostgreSQL, Terraform, GitOps, CI/CD, and Kubernetes.

Irwin is an equal opportunity employer. We celebrate diversity and are committed to creating an inclusive environment for all.

Company Overview
FactSet (NYSE:FDS | NASDAQ:FDS) helps the financial community to see more, think bigger, and work better. Our digital platform and enterprise solutions deliver financial data, analytics, and open technology to more than 8,200 global clients, including over 200,000 individual users. Clients across the buy-side and sell-side, as well as wealth managers, private equity firms, and corporations, achieve more every day with our comprehensive and connected content, flexible next-generation workflow solutions, and client-centric specialized support. As a member of the S&P 500, we are committed to sustainable growth and have been recognized among the Best Places to Work in 2023 by Glassdoor as a Glassdoor Employees' Choice Award winner. Learn more at  and follow us on X and LinkedIn.

At FactSet, we celebrate difference of thought, experience, and perspective. Qualified applicants will be considered for employment without regard to characteristics protected by law.

Lead Site Reliability Engineer

2 weeks ago

Toronto, Ontario, Canada AceStack Full time

Job Title: Lead Site Reliability Engineer – Banking Domain (Wealth Management Preferred)Location: Toronto Downtown, ON (Onsite – 5 Days/Week)Duration: ContractExperience: 14+ YearsAbout the Role:We are looking for a highly skilled Site Reliability Engineering (SRE) Lead with a strong background in the Banking domain, ideally within Wealth Management. The...
Site-Reliability Engineer

2 weeks ago

Toronto, Ontario, Canada Aarorn Technologies Inc Full time

Job Title: Site-Reliability Engineer (SRE)Location: Toronto, ON (3x onsite a week)Employment Type: ContractJob DescriptionWe are seeking a highly skilled Site Reliability Engineer (SRE) to enhance the reliability, performance, and efficiency of mission-critical batch workloads within Capital Markets Technology. In this role, you will serve as the technical...
Site Reliability Engineer

3 days ago

Toronto, Ontario, Canada Procom Full time

Site Reliability Engineer (SRE)/ Ingénieur Fiabilité des SitesOn behalf of our banking client, Procom is seeking a Site Reliability Engineer (SRE) for a 12-month contract role. This position is a hybrid role, 3 days a week onsite at our client's Montréal, Quebec office.Site Reliability Engineer - Job Description:The Site Reliability Engineer is...
Site Reliability Engineer

1 week ago

Toronto, Ontario, Canada Tecsys Inc. Full time

Having recognized the advantages of remote work, including employee morale, productivity, reduced commuting on employee wellbeing and the environment, we are proud to be a digital-first company. The technologies and programs in which we invested have provided a fantastic foundation to this end. Our digital-first work environment, together with our...
Site Reliability Engineer

2 weeks ago

Toronto, Ontario, Canada Pixomondo Full time

We're seeking an experienced Site Reliability Engineer to join our team and lead infrastructure automation, CI/CD workflows, and deployment operations for a custom web platform. You'll be working with a modern DevOps stack including GitHub Actions, GCP, Kubernetes, Terraform, PostgreSQL, CodeDeploy, and Cloudflare to ensure our platform is robust, scalable,...
Site Reliability Engineer

3 days ago

Toronto, Ontario, Canada Scotiabank Full time

Requisition ID: 244026Join a purpose driven winning team, committed to results, in an inclusive and high-performing culture.Overview:As a Site Reliability Engineer (SRE), you will join the Digital Engineering Operations team, responsible for ensuring the operations and reliability of Scotiabank digital applications. You will have the opportunity to drive the...
Site Reliability Engineer

3 days ago

Toronto, Ontario, Canada Scotiabank Full time

Requisition ID: 244027Join a purpose driven winning team, committed to results, in an inclusive and high-performing culture.Overview:As a Site Reliability Engineer (SRE), you will join the Digital Engineering Operations team, responsible for ensuring the operations and reliability of Scotiabank digital applications. You will have the opportunity to drive the...
Site Reliability Engineer

7 days ago

Toronto, Ontario, Canada Kablamo Full time

Reports to: Technical Support ManagerLocation: Toronto (Hybrid)Role Type: Full timeLevel: Intermediate/MidIntroductionKablamo is a fast-growing cloud digital product development company. Founded in 2017 in Australia, the business has grown quickly over the last several years, including the expansion of the team to Canada in 2021. We are proud to have...
Site Reliability Engineer

3 days ago

Toronto, Ontario, Canada Xplor Full time $125,000 - $150,000

Company Description Take a seat on the Xplor rocketship and join us as Site Reliability Engineer to help people succeed across the world.From dropping your kids off at childcare, getting something at home repaired, going to the gym or a fitness studio, to picking up your dry cleaning — our software, payments, and commerce-enabling solutions help everyday...
Senior Site Reliability Engineer

2 weeks ago

Toronto, Ontario, Canada Autodesk Full time

Job Requisition ID #25WD92369Position OverviewWe are seeking a highly motivated and experienced Senior Site Reliability Engineer (SRE) to manage critical cloudinfrastructure and site reliability operations for Autodesk's global Product Access journey. This pivotal role focuses on ensuringthe highest reliability, availability, and performance of our...

Americas

Europe

Asia / Oceania

Africa

Lead Site Reliability Engineer