AI SRE Engineer
4 weeks ago
Inclusion without Exception: Tata Consultancy Services (TCS) is an equal opportunity employer, and embraces diversity in race, nationality, ethnicity, gender, age, physical ability, neurodiversity, and sexual orientation, to create a workforce that reflects the societies we operate in. Our continued commitment to Culture and Diversity is reflected in our people stories across our workforce and implemented through equitable workplace policies and processes. About TCS: TCS is an IT services, consulting, and business solutions organization that has been partnering with many of the world’s largest businesses in their transformation journeys for over 55 years. Its consulting-led, cognitive-powered portfolio of business, technology, and engineering services and solutions is delivered through its unique Location Independent Agile™ delivery model, recognized as a benchmark of excellence in software development. A part of the Tata group, India's largest multinational business group, TCS operates in 55 countries and employs over 607,000 highly skilled individuals, including more than 10,000 in Canada. The company generated consolidated revenues of US $ 30 billion in the fiscal year ended March 31, 2025, and is listed on the BSE and the NSE in India. TCS' proactive stance on climate change and award-winning work with communities across the world have earned it a place in leading sustainability indices such as the MSCI Global Sustainability Index and the FTSE4Good Emerging Index. Technical Skills: Production experience in SRE / Infrastructure / ops for large-scale systems Strong programming/scripting skills (Python, Go, Java, or equivalent) Deep experience with containerization (Docker), orchestration (Kubernetes, etc.) Infrastructure-as-code (Terraform, Helm, CloudFormation, Ansible, etc.) Familiarity with GPU / AI compute clusters, high-performance data storage, and distributed architectures Experience with monitoring / observability / logging / alerting tools (Prometheus, Grafana, ELK / EFK, Datadog, etc.) Networking & systems engineering knowledge (TCP/IP, DNS, routing, load balancing, distributed storage) Solid experience in capacity planning, performance tuning, scaling, and incident response Demonstrated ability to lead RCAs, deploy fixes, and drive reliability improvements Experience in regulated environments (financial services, compliance, audit, security) is a strong plus Excellent communication, documentation, and cross-team collaboration skills Proven track record of reducing operational toil via automation. Skills and Responsibilities: Operate, monitor, and maintain the infrastructure supporting GenAI applications (training, inference, feature store, data ingestion, model serving) Design and build automation for core platform capabilities, reducing manual toil Develop and maintain infrastructure-as-code (IaC) for provisioning and managing compute, storage, network, GPU clusters, Kubernetes / container orchestration, etc. Establish, monitor, and enforce SLOs/SLIs/SLAs, error budgets, alerting, and dashboards Lead incident response, root cause analysis (RCA), postmortems, and systemic remediation Perform capacity planning, scaling strategies, workload scheduling, and resource forecasting Optimize cost vs. performance tradeoffs in large-scale compute environments Harden systems for security, compliance, auditability, and data governance Collaborate across teams (cloud engineers, data engineers, infrastructure, security) to ensure safe deployment, rollout, rollback, and integration of new systems Define disaster recovery (DR) strategies, backup/restore practices, fault tolerance mechanisms Maintain runbooks, operational playbooks, documentation, and training materials Participate in on-call rotations and respond to production incidents 24/7 as needed Continuously evaluate and integrate new tools, frameworks, or technologies to enhance platform reliability Tata Consultancy Services Canada Inc. is committed to meeting the accessibility needs of all individuals in accordance with the Accessibility for Ontarians with Disabilities Act (AODA) and the Ontario Human Rights Code (OHRC). Should you require accommodation during the recruitment and selection process, please inform Human Resources. Thank you for your interest in TCS. Candidates that meet the qualifications for this position will be contacted within a 2-week period. We invite you to continue to apply for other opportunities that match your profile.
-
AI SRE Engineer
4 weeks ago
Montréal, Canada Tata Consultancy Services Full timeInclusion without Exception: Tata Consultancy Services (TCS) is an equal opportunity employer, and embraces diversity in race, nationality, ethnicity, gender, age, physical ability, neurodiversity, and sexual orientation, to create a workforce that reflects the societies we operate in. Our continued commitment to Culture and Diversity is reflected in our...
-
AI SRE Engineer
4 weeks ago
Quebec (QC), Canada Tata Consultancy Services Full timeInclusion without Exception:Tata Consultancy Services (TCS) is an equal opportunity employer, and embraces diversity in race, nationality, ethnicity, gender, age, physical ability, neurodiversity, and sexual orientation, to create a workforce that reflects the societies we operate in. Our continued commitment to Culture and Diversity is reflected in our...
-
DevOps Specialist
4 days ago
Montréal, Canada Medfar Full timeCompany Description - MEDFAR Clinical Solutions was founded in 2010 by two aeronautical engineers who realized that the healthcare system was not exploiting the full potential of technology. Supported by a large community of medical experts and focused on clinical success and patient safety, MEDFAR was the first company to certify a cloud-based Electronic...
-
MONTREAL [Hybrid] - Senior DevOps SRE
1 week ago
Montréal, QC, Canada QUANTEAM (Groupe RAINBOW PARTNERS) Full timeAbout the Company: As the founding entity of RAINBOW PARTNERS, Quanteam is a consulting firm specializing in Banking, Finance, and Financial Services. Guided by our core values of closeness, teamwork, diversity, and excellence, our team of 1,000 expert consultants, representing 35 different nationalities, collaborates across 10 international offices: Paris,...
-
MONTREAL [Hybrid] - Senior DevOps SRE
1 week ago
Montréal, QC, Canada QUANTEAM (Groupe RAINBOW PARTNERS) Full timeAbout the Company: As the founding entity of RAINBOW PARTNERS, Quanteam is a consulting firm specializing in Banking, Finance, and Financial Services. Guided by our core values of closeness, teamwork, diversity, and excellence, our team of 1,000 expert consultants, representing 35 different nationalities, collaborates across 10 international offices: Paris,...
-
MONTREAL [Hybrid] - Senior DevOps SRE
1 week ago
Montréal, QC, Canada QUANTEAM (Groupe RAINBOW PARTNERS) Full timeAbout the Company: As the founding entity of RAINBOW PARTNERS, Quanteam is a consulting firm specializing in Banking, Finance, and Financial Services. Guided by our core values of closeness, teamwork, diversity, and excellence, our team of 1,000 expert consultants, representing 35 different nationalities, collaborates across 10 international offices: Paris,...
-
DevOps Sre Specialist
2 days ago
Montréal, QC, Canada C.G.I. Full time**DevOps SRE Specialist (Intermediate)** **About the Role**: **Key Responsibilities**: - ** Platform Engineering and Support**: - Manage and maintain our CI/CD infrastructure, including Jenkins, Git, Nexus, SonarQube, and other relevant tools. - Implement and automate routine tasks to increase efficiency and reduce manual effort. - Proactively monitor...
-
AI Engineer
4 weeks ago
Montréal, QC, Canada BoxOne Ventures Full timeJob Description BoxOne Ventures is redefining venture investing for the era of AI and autonomous agents. We’re seeking a AI Engineer to work closely with our AI Engineering Lead to design, build, and improve our next-generation technology platform—covering everything from autonomous deal scouting and AI-driven diligence to research augmentation and...
-
Sre Specialist
2 weeks ago
Montréal, Canada GHGSat Full timeGHGSat offers greenhouse gas detection, measurement, and monitoring services to industrial and government customers around the world. The company uses its own satellites and aircraft sensors, combined with third-party data, to help industrial emitters better understand, control, and reduce their emissions. GHGSat’s capability is unique: the company...
-
Sre Lead
2 weeks ago
Montréal, Canada Intelerad Full timeCompany Description Improving healthcare through innovative technology is at the core of Intelerad’s work. Our scalable medical imaging platform connects clinicians to a powerful imaging ecosystem that is fast, smart, and tapped into the data they need, no matter their location. We’re focused on delivering a best-in-class medical image management...