Principal AI Cloud Engineer
4 weeks ago
Date limite pour présenter sa candidature :12/30/2025Adresse :100 King Street WestGroupe de famille d'emploi :Analyses des données et communication de l'informationThe TeamWe accelerate BMO’s AI journey by building enterprise-grade, cloud-native AI solutions. Our team combines engineering excellence with cutting-edge AI to deliver scalable, secure, and responsible solutions that power business innovation across the bank. We enable and accelerate our partners on their AI journeys across the enterprise, helping teams across BMO unlock value at scale. We support one another in times of need and take pride in our work. We are engineers, AI practitioners, platform builders, thought leaders, multipliers, and coders. Above all, we are a global team of diverse individuals who enjoy working together to create smart, secure, and scalable solutions that make an impact across the enterprise. Our ambition is bold: deploy our capital and resources to their highest and most profitable use through a digital-first operating model, powered by data and AI-driven decisions.The ImpactAs a Principal Cloud AI Engineer, you are a hands-on technical developer who designs, builds, and scales cloud-native AI solutions and products. You help set engineering standards, establish patterns, mentor senior engineers, and partner with multiple teams to deliver resilient, governed, and cost-efficient AI at enterprise scale. You’ll help shape and evolve our AI cloud strategy from model serving and LLMOps to security, observability, and compliance so teams across the bank can innovate safely and rapidly.You will advance BMO’s Digital First strategy by:Defining reference and production-grade solutions for AI/GenAI on cloud (AWS preferred; multi-cloud aware).Building reusable, secure, and observable components (APIs, SDKs, microservices, pipelines).Operationalizing LLMs and RAG with strong controls and Responsible AI guardrails.Driving platform roadmaps that enable faster delivery, lower risk, and measurable business outcomes.What’s In It for YouInfluence the technical direction of enterprise AI and the platform primitives others build on.Ship high-impact systems used across many business lines and products.Work across the full stack: cloud infra, data/feature pipelines, model serving, LLMOps, and DevSecOps.Partner with a leadership team invested in your growth and thought leadership.ResponsibilitiesInfrastructure & Platform BuilderDesign, build, and operate cloud-native AI infrastructure for ML/GenAI workloads:Compute: GPU/CPU clusters, autoscaling, spot instance strategiesNetworking: AWS VPC, PrivateLink, peering, multi-region HA/DRStorage & Databases: high-performance data lakes (e.g., S3-based data lake), relational DBs, vector DBs (FAISS, Milvus, Pinecone, pgvector)Security: IAM, Secrets Manager / KMS-backed secrets management and encryption, policy-as-codeImplement observability and reliability for AI infra:Metrics (latency, throughput, GPU utilization, cost)Logging/tracing (OpenTelemetry), SLOs/SLIs for infra servicesBuild CI/CD and GitOps pipelines for infrastructure-as-code (Terraform/CloudFormation) and AI platform componentsDrive FinOps for AI infra: GPU rightsizing, caching, inference optimization, cost governanceApplication & Service EnablementEnable frontend and backend services for AI platforms:Secure APIs, microservices, and event-driven architecturesIntegration with custom model runtimes (TensorRT-LLM, vLLM, Triton/KServe)Provide infrastructure support for RAG systems: embeddings, chunking, retrieval pipelinesEnsure scalable serving infrastructure for LLMs and ML models with caching and token optimizationStrategy & ArchitectureDefine and evolve AI infrastructure reference architecture for cloud (AWS preferred):Container orchestration (Kubernetes/EKS), service mesh, ingressServerless/event-driven patterns for AI pipelinesMulti-region, HA/DR, compliance-ready designsEstablish standards and best practices for containerization, IaC, and secure networking for AI systemsSecurity, Risk & GovernanceImplement defense-in-depth for AI infra:IAM least privilege, private networking, KMS/Secrets Manager, SBOM, image signingEnsure compliance and Responsible AI controls at infra level:Data residency, encryption, lineage, audit readinessDelivery & OperationsLead infrastructure discovery and solution design with stakeholdersOperate platforms with SRE principles: error budgets, incident response, chaos testingMentor engineers; create reusable IaC modules, templates, and golden pathsMust-Have QualificationsBachelor’s/Master’s/PhD in CS, Engineering, or related field7+ years building large-scale distributed cloud infrastructure5+ years hands-on with AWS (preferred); Azure/GCP nice to haveProven experience with AI/ML infra: GPU clusters, Kubernetes, CI/CD, observabilityStrong in IaC (Terraform/CloudFormation), Kubernetes, networking, securityExpertise in cloud-native patterns: containers, service mesh, serverlessFamiliarity with MLOps/LLMOps infra: model serving, feature stores, vector DBsProgramming in Python (infra automation) and one of Go/TypeScript for toolingUnderstanding of frontend/backend integration for AI servicesFamiliarity with MLOps/LLMOps infra: model serving, feature stores, vector DBsProgramming in Python (infra automation) and one of Go/TypeScript for toolingUnderstanding of frontend/backend integration for AI servicesNice-to-HaveGPU optimization (CUDA/NCCL, TensorRT-LLM)Observability tools (Prometheus, Grafana, OpenTelemetry)Event streaming (Kafka/Kinesis), real-time systemsExperience with AI platform products (Amazon SageMaker), MLflow, KServe, Hugging FaceTech StackCloud & Infra: AWS (EKS, Lambda, Kinesis, Secrets Manager/KMS), Terraform/CloudFormation, GitHub Actions/AWS CodePipelineAI Infra: Kubernetes, KServe/Triton, vLLM, TensorRT-LLM, Ray, SparkOps: Prometheus, Grafana, OpenTelemetry, ArgoCD, OPAData: Feature stores (Feast), vector DBs (FAISS, Milvus, Pinecone), relational DBsApp Layer: APIs, microservices, frontend/backend integration for AI systemsSuccess MetricsReliability & Performance: SLOs met for infra services, GPU utilization optimizedSecurity & Compliance: Zero critical findings, auditable infraCost Efficiency: Reduced GPU/infra spend via FinOps strategiesDeveloper Velocity: Faster provisioning and deployment of AI infraTechnical Leadership: Influence on infra standards, mentorship, reusable patternsSalaire :$103,200.00 - $192,000.00Type de rémunération :SalaireCe qui précède représente la fourchette et le type de rémunération de BMO Groupe financier.Les salaires varieront en fonction de facteurs comme l’emplacement, les compétences, l’expérience, les études et les qualifications pour le poste et pourront inclure une structure de commissions. Les salaires pour les postes à temps partiel seront calculés au prorata du nombre d’heures travaillées régulièrement. Pour les rôles à commission, le salaire susmentionné représente la cible de BMO Groupe financier pour la première année au poste.La rémunération totale offerte par BMO variera selon le type de rémunération associé au poste et peut comprendre des primes de rendement, des primes discrétionnaires ainsi que d’autres avantages et récompenses. BMO offre également une assurance santé, le remboursement des frais de scolarité, une assurance accident et une assurance vie, ainsi que des régimes d’épargne-retraite. Pour en savoir plus sur nos avantages sociaux, consultez le site : https://jobs.bmo.com/ca/fr/R%C3%A9mun%C3%A9ration-globaleÀ propos de nousÀ BMO, nous sommes animés par une raison d’être commune : Avoir le cran de faire une différence dans la vie, comme en affaires. Cette raison d’être nous invite à entraîner des changements positifs et durables pour nos clients, nos collectivités et nos gens. En travaillant ensemble, en innovant et en repoussant les limites, nous transformons des vies et des entreprises et favorisons la croissance économique partout dans le monde.En tant que membre de l'équipe de BMO, vous êtes valorisé, respecté et entendu, et vous avez plus de moyens pour progresser et obtenir des résultats. Nous nous efforçons de vous aider à obtenir des résultats dès le premier jour, pour vous-même et nos clients. Nous vous offrirons les outils et les ressources dont vous avez besoin pour franchir de nouvelles étapes, car vous aidez nos clients à franchir les leurs. Au moyen de formation et de coaching approfondis ainsi que de soutien de la direction et d'occasions de réseautage, nous vous aiderons à acquérir une expérience enrichissante et à élargir votre groupe de compétences.Pour en savoir plus, visitez-nous à l'adresse https://jobs.bmo.com/ca/fr.BMO s'engage à offrir un milieu de travail inclusif, équitable et accessible. Nous apprenons de nos différences et tirons notre force des gens et de leurs différents points de vue. Des mesures d’adaptation sont disponibles sur demande pour les candidats qui participent à tous les aspects du processus de sélection. Pour demander des mesures d’adaptation, veuillez communiquer avec votre recruteur.Remarque aux recruteurs : BMO n’accepte pas les curriculum vitæ non sollicités provenant de toute source autre que le candidat directement. Tout curriculum vitæ non sollicité envoyé à BMO, directement ou indirectement, sera considéré comme la propriété de BMO. BMO ne paiera aucuns frais pour les placements découlant de la réception d’un curriculum vitæ non sollicité. Une agence de recrutement doit d’abord détenir une entente de service écrite valide et dûment signée avant d’envoyer des curriculum vitæ.
-
Principal AI Cloud Engineer
5 days ago
Toronto, Ontario, Canada BMO Full timeThe TeamWe accelerate BMO's AI journey by building enterprise-grade, cloud-native AI solutions. Our team combines engineering excellence with cutting-edge AI to deliver scalable, secure, and responsible solutions that power business innovation across the bank. We enable and accelerate our partners on their AI journeys across the enterprise, helping teams...
-
Principal, Cloud Engineering
20 hours ago
Toronto, Canada Scotiabank Full timeJoin to apply for the Principal, Cloud Engineering role at Scotiabank1 week ago Be among the first 25 applicantsJoin to apply for the Principal, Cloud Engineering role at ScotiabankRequisition ID: 230350Join a purpose driven winning team, committed to results, in an inclusive and high-performing culture.As a Principal, Cloud Engineering – Cloud Operations...
-
Toronto, Canada Latinx in AI (LXAI) Full timeA leading AI firm in Toronto seeks a Senior/Principal Machine Learning Engineer to design and build core ML systems for AI agents. This position requires extensive experience in machine learning, deep learning techniques, and cloud platforms. You'll collaborate with cross-functional teams to ensure scalability and impact. The role offers a dynamic work...
-
Toronto, Canada Latinx in AI (LXAI) Full timeA leading AI firm in Toronto seeks a Senior/Principal Machine Learning Engineer to design and build core ML systems for AI agents. This position requires extensive experience in machine learning, deep learning techniques, and cloud platforms. You'll collaborate with cross-functional teams to ensure scalability and impact. The role offers a dynamic work...
-
Senior AI Systems Engineer
6 days ago
Toronto, Canada Latinx in AI (LXAI) Full timeA forward-thinking AI organization in Toronto seeks a Senior/Principal Machine Learning Engineer who will design and implement core machine learning systems for AI agents. This role involves collaboration with cross-functional teams to integrate AI solutions deeply into the organization. Ideal candidates will have extensive experience in machine learning...
-
Senior AI Systems Engineer
6 days ago
Toronto, Canada Latinx in AI (LXAI) Full timeA forward-thinking AI organization in Toronto seeks a Senior/Principal Machine Learning Engineer who will design and implement core machine learning systems for AI agents. This role involves collaboration with cross-functional teams to integrate AI solutions deeply into the organization. Ideal candidates will have extensive experience in machine learning...
-
Senior/Principal Machine Learning Engineer
6 days ago
Toronto, Canada Latinx in AI (LXAI) Full timeSenior/Principal Machine Learning Engineer Join to apply for the Senior/Principal Machine Learning Engineer role at Latinx in AI (LXAI) . About The Team Agent Factory is where Workday’s next chapter gets built. We’re forming small, senior, cross-functional AI teams that bring together product leaders, machine learning engineers, and full-stack builders...
-
Toronto, Canada Genesys Cloud Services, Inc. Full timeA leading cloud services provider in Toronto is seeking a Principal AI Product Manager to innovate in AI-driven contact center solutions. You will collaborate with cross-functional teams to develop intelligent platforms that enhance supervisor effectiveness and customer experiences. Ideal candidates will have over 8 years of experience in product management...
-
Toronto, Canada Genesys Cloud Services, Inc. Full timeA leading cloud services provider in Toronto is seeking a Principal AI Product Manager to innovate in AI-driven contact center solutions. You will collaborate with cross-functional teams to develop intelligent platforms that enhance supervisor effectiveness and customer experiences. Ideal candidates will have over 8 years of experience in product management...
-
Toronto, Canada Genesys Cloud Services, Inc. Full timeA leading cloud services provider in Toronto is seeking a Principal AI Product Manager to innovate in AI-driven contact center solutions. You will collaborate with cross-functional teams to develop intelligent platforms that enhance supervisor effectiveness and customer experiences. Ideal candidates will have over 8 years of experience in product management...