Engineering Lead Analyst, Innovation Labs

11 hours ago


Mississauga, Canada PowerToFly Full time

Overview As an Infra & DevOps Engineer, you will join a dynamic team in the Citi Innovation Labs under CTO organization. You will operate within NAM hours, complementing our existing team primarily based in Israel (EMEA hours). Your expertise will be vital in strengthening our infrastructure and DevOps practices, directly contributing to faster and more reliable software delivery. This role is deeply hands-on, focusing on implementing, maintaining, and optimizing critical systems that foster innovation and support our scalable, resilient, and secure infrastructure. You will be an active team player, bringing specialized technical skills to address operational challenges, implement advanced solutions, and collaborate closely to achieve our collective goals, especially within high-performance and GenAI environments. Key Responsibilities Core System Implementation: Implement and maintain essential infrastructure components, including specific configurations for on-prem GPU clusters (V100/A100/H100/H200 MIG) that underpin GenAI and high-performance workloads, ensuring operational stability. CI/CD Operations & Improvement: Contribute to the efficient operation and continuous improvement of our CI/CD pipelines and automation frameworks. Leverage and contribute to our GitHub repositories to streamline development and deployment processes. System Reliability & Performance: Monitor, troubleshoot, and optimize system reliability and performance across various environments. Work with the team to identify and resolve critical issues promptly, ensuring a high level of operational availability and client satisfaction. Automation Development: Develop and implement automation scripts and tools to enhance operational efficiency, reduce manual effort, and improve the consistency of our infrastructure and deployment processes. Emerging Technology Support: Provide hands-on support for the deployment and ongoing operation of emerging technologies relevant to GenAI, such as NIM images, MLflow 3.x, Coder, and LLMOps infrastructure. Actively contribute to the setup and maintenance of experimentation platforms like GCP Sandbox. Operational Best Practices: Adhere to and actively contribute to established operational best practices, documentation, and runbooks to ensure consistency and maintainability of our systems. Team Collaboration: Work seamlessly within the team, participating in discussions, sharing insights, and collaborating with colleagues and development partners to achieve shared objectives. Skills & Experience Required 6+ years of overall work experience, specifically 5+ years of dedicated, hands-on technical experience in Infrastructure, Site Reliability Engineering (SRE), or DevOps roles, with a proven ability to contribute significantly to complex operational environments. Proven practical experience in working with and optimizing GPU infrastructure for GenAI and high-performance computing - is an advantage. Strong practical knowledge of cloud environments, containerization technologies (Docker, Kubernetes, OpenShift), and operational aspects of serverless computing. Proficiency in scripting languages (e.g., Python, Bash) for system automation, configuration, and diagnostics. Demonstrated experience in implementing and operating CI/CD pipelines, infrastructure-as-code principles, and automation solutions, with solid experience using GitHub. Understanding of and ability to apply enterprise security best practices, compliance standards, and data privacy considerations in daily operations. Solid problem-solving skills with an ability to diagnose and resolve technical issues effectively in production environments. Strong communication and interpersonal skills, fostering effective teamwork and collaboration within a diverse, global team. Bachelor’s degree in computer science, engineering, or a related technical field, or equivalent practical experience. Tech Stack Expertise Cloud Platforms: AWS, GCP (Operational experience). GPU Infrastructure: NVIDIA V100/A100/H100 /H200 clusters, MIG (Practical operational experience). Scripting & Automation: Python, Bash. CI/CD Orchestration: Tekton, Harness, CI/CD for GenAI workloads. Version Control & Collaboration: Git, GitHub Enterprise, Jira, Confluence. Database Technologies: MongoDB/MaaS, PostgreSQL and Redis (Operational knowledge). Operating Systems: Linux, Wintel (System administration experience). Containerization & Orchestration: Docker, Kubernetes, OpenShift (Hands-on operational experience). Networking: Load Balancers, DNS. Monitoring & Observability: ELK Stack, Prometheus, Grafana, ITRS (Practical operational experience). Infrastructure as Code: Terraform, Ansible (or similar) (Practical application). Developer Productivity Tools: GitHub Copilot, StackOverflow for Teams, Devin, Delphine. Service Mesh: Practical operational experience. Education Bachelor’s degree/University degree or equivalent experience Master’s degree preferred Citi is an equal opportunity employer, and qualified candidates will receive consideration without regard to their race, color, religion, sex, sexual orientation, gender identity, national origin, disability, status as a protected veteran, or any other characteristic protected by law. If you are a person with a disability and need a reasonable accommodation to use our search tools and/or apply for a career opportunity review Accessibility at Citi. Please refer to Citi’s Accessibility resources and policies for further details. This job opening is for an existing job vacancy. #J-18808-Ljbffr



  • Mississauga, Canada COMPASS GROUP CANADA Full time

    We are CDAI— the data and artificial intelligence engine of Compass Group North America. We design and deliver custom, in‑house solutions tailored to the unique complexities of food service and hospitality. Our work is grounded in strong data foundations, layered with AI to enhance forecasting, streamline operations, and enable better, faster...

  • QA Analyst

    2 weeks ago


    Mississauga, Canada Orion Innovation Full time

    Orion Innovation is a premier, award-winning, global business and technology services firm. Orion delivers game-changing business transformation and product development rooted in digital strategy, experience design, and engineering, with a unique combination of agility, scale, and maturity. We work with a wide range of clients across many industries...

  • Laboratory Analyst 1

    2 weeks ago


    Mississauga, Canada Bureau Veritas Full time

    Do you believe in the power of teamwork and sharing ideas? Do you take pride in delivering exceptional quality and service with everything you do? Do you seek out ideas for improving the status quo? If you want to make a difference and love being surrounded by the best and the brightest, Bureau Veritas Laboratories might be the place for you! Imagine being...

  • Test Lab Technologist

    4 weeks ago


    Mississauga, Canada Johnson Electric Group Full time

    Test Lab Technologist page is loaded## Test Lab Technologistlocations: Canada, Mississaugatime type: Full timeposted on: Posted Yesterdayjob requisition id: R00027780**Join Our Team as a Test Lab Technologist at Johnson Electric!** **Location:** Mississauga, Ontario, Canada (onsite)## Your Mission, Should You Choose to Accept It:As our next **Test...


  • Mississauga, Canada Johnson Electric Full time

    Join Our Team as a Test Lab Technologist at Johnson Electric!Location: Mississauga, Ontario, Canada (onsite) Your Mission, Should You Choose to Accept It:As our next Test Lab Technologist, you’ll be at the forefront of supporting our development engineering department by conducting testing, inspection, and prototype build activities. Your expertise in lab...

  • Lab Technician

    1 week ago


    Mississauga, Canada Temp Aid Full time

    TempAid is a premier global innovator in temperature-controlled packaging solutions, specializing in cold chain products designed to preserve the integrity of temperature sensitive items. We proudly operate manufacturing facilities in China, Vietnam, the United States, and Mississauga, Ontario, Canada. TempAid is also distinguished as the only ISTA Standard...


  • Mississauga, Canada Bureau Veritas Full time

    Do you believe in the power of teamwork and sharing ideas? Do you take pride in delivering exceptional quality and service with everything you do? Do you seek out ideas for improving the status quo? If you want to make a difference and love being surrounded by the best and the brightest, Bureau Veritas Laboratories might be the place for you! Imagine being...


  • Mississauga, Canada EnVision Consultants Ltd. Full time

    The position comes with the opportunity to work very closely with highly experienced professionals who enjoy mentoring and are looking to assist in your career growth. **As a Lab Services Technician you will**: - Provide technical services in a lab and office environment; - Provide technical assistance in materials related projects and tasks, including...


  • Mississauga, Canada Citi Full time

    As an Infra & DevOps Engineer, you will join a dynamic team in the Citi Innovation Labs under the CTO organization. You will operate within NAM hours, complementing our existing team primarily based in Israel (EMEA hours). Your expertise will be vital in strengthening our infrastructure and DevOps practices, directly contributing to faster and more reliable...

  • Scrum Master

    2 weeks ago


    Mississauga, Canada Orion Innovation Full time

    Orion Innovation is a premier, award-winning, global business and technology services firm. Orion delivers game-changing business transformation and product development rooted in digital strategy, experience design, and engineering, with a unique combination of agility, scale, and maturity. We work with a wide range of clients across many industries...