Hiring AI Ops engineer in Montreal, Quebec
2 weeks ago
Location: Montreal, Quebec Duration: 06 Months Experience Required: 8 years of experience as a Site Reliability Engineer or in a similar role, with hands-on experience in supporting IaaS platforms with networking and system engineering knowledge. Job Description: We are seeking a dedicated professional to join our team. The successful candidate will be responsible for operating, monitoring, and maintaining the infrastructure supporting GenAI applications. The role requires a proactive approach to design and build automation for core platform capabilities, reducing manual toil and enhancing platform reliability. Required Skills & Qualifications Applicants must be able to work directly for Artech on W2 Production experience in SRE / Infrastructure / ops for large-scale systems Strong programming/scripting skills (Python, Go, Java, or equivalent) Deep experience with containerization (Docker), orchestration (Kubernetes, etc.) Infrastructure-as-code (Terraform, Helm, CloudFormation, Ansible, etc.) Networking & systems engineering knowledge (TCP/IP, DNS, routing, load balancing, distributed storage) Solid experience in capacity planning, performance tuning, scaling, and incident response Demonstrated ability to lead RCAs, deploy fixes, and drive reliability improvements Preferred Skills & Qualifications Familiarity with GPU / AI compute clusters, high-performance data storage, and distributed architectures Experience with monitoring / observability / logging / alerting tools (Prometheus, Grafana, ELK / EFK, Datadog, etc.) Experience in regulated environments (financial services, compliance, audit, security) is a strong plus Excellent communication, documentation, and cross-team collaboration skills Proven track record of reducing operational toil via automation Day-to-Day Responsibilities Operate, monitor, and maintain the infrastructure supporting GenAI applications Design and build automation for core platform capabilities Develop and maintain infrastructure-as-code (IaC) for provisioning and managing compute, storage, network, GPU clusters, Kubernetes / container orchestration, etc. Establish, monitor, and enforce SLOs/SLIs/SLAs, error budgets, alerting, and dashboards Lead incident response, root cause analysis (RCA), postmortems, and systemic remediation Perform capacity planning, scaling strategies, workload scheduling, and resource forecasting Optimize cost vs. performance tradeoffs in large-scale compute environments Harden systems for security, compliance, auditability, and data governance Collaborate across teams to ensure safe deployment, rollout, rollback, and integration of new systems Define disaster recovery (DR) strategies, backup/restore practices, fault tolerance mechanisms Maintain runbooks, operational playbooks, documentation, and training materials Participate in on-call rotations and respond to production incidents 24/7 as needed Continuously evaluate and integrate new tools, frameworks, or technologies to enhance platform reliability Company Benefits & Culture Inclusive and diverse work environment Opportunities for professional growth and development Supportive team culture focused on collaboration and innovation For immediate consideration please click APPLY to begin the screening process with Alex.
-
Montreal, Canada Artech LLC Full timeJob Title: Senior Azure AI Infrastructure Engineer Location: Montreal, Quebec Duration: 06 Months Exp Required: 8 years Job Description: Required Skills & Qualifications Proficient with CDKTF and Terraform for Azure infrastructure Hands-on experience with Azure OpenAI, Cognitive Services, and Machine Learning Experience with Azure Data Services, including...
-
Account Executive, Enterprise Montreal
4 weeks ago
Montreal, Canada Mistral AI Full timeAccount Executive, Enterprise Montreal Mistral AI – Montreal, Quebec, Canada About Mistral At Mistral AI, we believe in the power of AI to simplify tasks, save time, and enhance learning and creativity. Our technology is designed to integrate seamlessly into daily working life, democratizing AI through high-performance, optimized, open-source and...
-
Account Executive, Enterprise Montreal
3 weeks ago
Montreal, Canada Mistral AI Full timeAccount Executive, Enterprise Montreal Mistral AI – Montreal, Quebec, Canada About Mistral At Mistral AI, we believe in the power of AI to simplify tasks, save time, and enhance learning and creativity. Our technology is designed to integrate seamlessly into daily working life, democratizing AI through high-performance, optimized, open-source and...
-
Urgent Hiring for Developer in Montreal, Quebec.
2 weeks ago
Montreal, Canada Artech LLC Full timeLocation: Montreal, Quebec Duration: 06 Months Job Description: We are seeking a professional responsible for designing and delivering scalable, secure, and efficient technology solutions across complex enterprise environments. The role involves working closely with business leaders and technical teams to ensure solutions align with strategic objectives,...
-
Senior Software Engineer, AI Model serving
2 weeks ago
Montreal, Canada Clutch Canada Full timeSenior Software Engineer, AI Model serving - Montreal, CanadaJoin to apply for the Senior Software Engineer, AI Model serving - Montreal, Canada role at SpeechifySenior Software Engineer, AI Model serving - Montreal, Canada3 days ago Be among the first 25 applicantsJoin to apply for the Senior Software Engineer, AI Model serving - Montreal, Canada role at...
-
Senior Software Engineer, AI Model serving
2 weeks ago
Montreal, Canada Clutch Canada Full timeSenior Software Engineer, AI Model serving - Montreal, Canada Join to apply for the Senior Software Engineer, AI Model serving - Montreal, Canada role at Speechify Senior Software Engineer, AI Model serving - Montreal, Canada 3 days ago Be among the first 25 applicants Join to apply for the Senior Software Engineer, AI Model serving - Montreal, Canada...
-
Senior Software Engineer, AI Model serving
2 weeks ago
Montreal, Canada Clutch Canada Full timeSenior Software Engineer, AI Model serving - Montreal, Canada Join to apply for the Senior Software Engineer, AI Model serving - Montreal, Canada role at Speechify Senior Software Engineer, AI Model serving - Montreal, Canada 3 days ago Be among the first 25 applicants Join to apply for the Senior Software Engineer, AI Model serving - Montreal, Canada role...
-
AI/ML Infrastructure Engineer
7 days ago
Montreal, Canada BULL-IT SOLUTIONS LTD Full timeDelivery Head | Canada Recruitment | Talent Acquisition Location: Montreal, Quebec, Canada Seniority level: Mid-Senior level Employment type: Full-time Job function: Information Technology Skills Required Production experience in SRE / Infrastructure / ops for large-scale systems Strong programming/scripting skills (Python, Go, Java, or equivalent) Deep...
-
AI/ML Infrastructure Engineer
5 days ago
Montreal, Canada BULL-IT SOLUTIONS LTD Full timeDelivery Head | Canada Recruitment | Talent Acquisition Location: Montreal, Quebec, Canada Seniority level: Mid-Senior level Employment type: Full-time Job function: Information Technology Skills Required Production experience in SRE / Infrastructure / ops for large-scale systems Strong programming/scripting skills (Python, Go, Java, or equivalent) Deep...
-
Montreal, Canada Artech LLC Full timeJob Title: System Performance Analyst Location: Montreal, Quebec Duration: 6 Months Description: We are seeking a skilled professional to join our team, focusing on managing and enhancing our IT infrastructure and systems. This role involves working on automation, system performance, and stakeholder management to ensure seamless operations and client...