Senior ML Performance Engineer

2 weeks ago

Toronto, Ontario, Canada Lemurian Labs Full time

About Us

At Lemurian Labs, we're on a mission to bring the power of AI to everyone—without leaving a massive environmental footprint. We care deeply about the impact AI has on our society and planet, and we're building a solid foundation for its future, ensuring AI grows sustainably and responsibly. Innovation should help the world, not harm it.

We are building a high-performance, portable compiler that lets developers "build once, deploy anywhere." Yes, anywhere. We're talking about seamless cross-platform compatibility, so you can train your models in the cloud, deploy them to the edge, and everything in between—all while optimizing for resource efficiency and scalability.

If the idea of sustainably scaling AI motivates you and you're excited about making AI development both powerful and accessible, then we'd love to have you. Join us at Lemurian Labs, where you can have fun building the future—without leaving a mess behind.

The Role

We're looking for a Senior ML Performance Engineer to architect and lead our Performance Testing Platform from the ground up. You'll be the technical authority on how we measure, validate, and optimize the performance of large language models (Llama 3.2 70B, DeepSeek, and others) before and after compiler optimization on modern GPU architectures.

This is a high-impact role where you'll directly influence our product quality and our customers' success. You'll work at the intersection of ML systems, GPU architecture, and performance engineering—building the infrastructure that proves our compiler delivers real value.

What You'll Do

Design and build a comprehensive performance testing platform for evaluating LLM inference workloads across GPU clusters
Define and implement the benchmarking methodology, metrics, and test suites that measure latency, throughput, memory utilization, power consumption, and model accuracy
Establish baseline performance for unoptimized models (Llama 3.2 70B, DeepSeek, etc.) and validate post-optimization improvements
Develop automated testing pipelines for continuous performance validation across compiler releases and model updates
Investigate performance bottlenecks using profiling tools (ROCm profilers, GPU traces, system-level monitoring) and work with the compiler team to drive optimizations
Create dashboards and reporting that provide clear visibility into performance trends, regressions, and wins
Collaborate cross-functionally with compiler engineers, ML engineers, and DevOps to ensure performance testing is integrated into our development workflow
Document best practices for performance testing and optimization of ML workloads on GPU hardware

What You'll Bring

5+ years of experience in performance engineering, benchmarking, or systems engineering roles
Deep understanding of ML inference workloads, particularly transformer-based models and LLMs
Hands-on experience with GPU programming and optimization (CUDA, ROCm, or similar)
Strong programming skills in Python and C/C++
Proven track record of building performance testing infrastructure or benchmarking platforms from scratch
Experience with ML frameworks (PyTorch, TensorFlow, ONNX Runtime, vLLM, TensorRT-LLM, etc.)
Proficiency with profiling and debugging tools for GPU workloads
Strong analytical skills with the ability to design experiments, analyze results, and communicate findings clearly
Experience with CI/CD systems and test automation frameworks

Nice to Have

Experience with AMD GPUs (Mi200/Mi300 series) and ROCm ecosystem
Knowledge of compiler optimization techniques and their impact on performance
Experience with distributed inference and multi-GPU workloads
Familiarity with ML model quantization, pruning, and other optimization techniques
Background in high-performance computing or systems-level optimization
Experience with infrastructure-as-code (Kubernetes, Docker, Terraform)
Contributions to open-source ML or systems projects

Personal Attributes

Obsessive about details — you notice the 2% regression that others miss
Self-driven — you take ownership and don't wait for permission to solve problems
Collaborative mindset — you work well across teams and help others succeed
Passionate about sustainability — you care about making AI more efficient and environmentally responsible
Clear communicator — you can explain complex technical concepts to both engineers and stakeholders

Salary depends on experience and geographical location.

This salary range may be inclusive of several career levels and will be narrowed during the interview process based on a number of factors, such as the candidate's experience, knowledge, skills, and abilities, as well as internal equity among our team.

Additional benefits for this role may include: equity, company bonus opportunities, medical, dental, and vision benefits; retirement savings plan; and supplemental wellness benefits.

Lemurian Labs ensures equal employment opportunity without discrimination or harassment based on race, color, religion, sex (including pregnancy, childbirth, or related medical conditions), sexual orientation, gender identity or expression, age, disability, national origin, marital or domestic/civil partnership status, genetic information, citizenship status, veteran status, or any other characteristic protected by law.

EOE

Senior ML Ops Engineer

3 days ago

Toronto, Ontario, Canada Greenhouse Full time

Our mission at Greenhouse is to make every company great at hiring – so we go to great lengths to hire great people because we believe that they're the foundation of our success. At Greenhouse, you'll join a team that collaborates purposefully, fosters inclusivity, and communicates with transparency and accountability so we can help companies measurably...
AI/ML Engineer

3 days ago

Toronto, Ontario, Canada The Vanguard Group Full time

Vanguard is seeking a talented and motivated AI/ML Engineer to join our team in building agentic systems for IT operations and resilience checking. This role is ideal for early-career professionals who are passionate about AI, autonomous agents, and advanced machine learning techniques. You will work alongside senior data scientists and engineers to develop...
Senior Runtime Performance Engineer

2 days ago

Toronto, Ontario, Canada Cerebras Systems Full time

Cerebras Systems builds the world's largest AI chip, 56 times larger than GPUs. Our novel wafer-scale architecture provides the AI compute power of dozens of GPUs on a single chip, with the programming simplicity of a single device. This approach allows Cerebras to deliver industry-leading training and inference speeds and empowers machine learning users to...
JD-2025-AIML-1: Senior AI/ML Engineer – Agentic AI

2 weeks ago

Toronto, Ontario, Canada Amyantek Full time

JD-2025-AIML-1: Senior AI/ML Engineer – Agentic AILocation: Toronto, ON (Hybrid)Client: Applab/LoblawsType: Full-time Team: Machine Learning Platform / Digital & DataAbout the RoleLoblaws Digital is hiring a Senior AI/ML Engineer with a strong emphasis on Agentic AI systems. This role focuses on building production-grade multi-agent workflows, LLM-powered...
ML Compiler Engineer

3 days ago

Toronto, Ontario, Canada Amazon Web Services (AWS) Full time

DescriptionAt AWS our vision is to make deep learning pervasive for everyday developers and to democratize access to innovative infrastructure. In order to deliver on that vision, we've created innovative software and hardware solutions that make it possible.AWS Neuron is the SDK that optimizes the performance of complex neural net models executed on AWS...
GenAI ML Engineer

2 weeks ago

Toronto, Ontario, Canada Tata Consultancy Services Full time

Inclusion without Exception:Tata Consultancy Services (TCS) is an equal opportunity employer, and embraces diversity in race, nationality, ethnicity, gender, age, physical ability, neurodiversity, and sexual orientation, to create a workforce that reflects the societies we operate in. Our continued commitment to Culture and Diversity is reflected in our...
ML Kernel Performance Engineer, AWS Neuron, Annapurna Labs

3 days ago

Toronto, Ontario, Canada Amazon Full time

The Annapurna Labs team at Amazon Web Services (AWS) builds AWS Neuron, the software development kit used to accelerate deep learning and GenAI workloads on Amazon's custom machine learning accelerators, Inferentia and Trainium.The Acceleration Kernel Library team is at the forefront of maximizing performance for AWS's custom ML accelerators. Working at the...
ML Compiler Engineer

2 weeks ago

Toronto, Ontario, Canada Amazon Web Services (AWS) Full time

DescriptionThe Annapurna Labs team at Amazon Web Services (AWS) builds AWS Neuron, the software development kit used to accelerate deep learning and GenAI workloads on Amazon's custom machine learning accelerators, Inferentia and Trainium.The Product: The AWS Machine Learning accelerators (Inferentia/Trainium) offer unparalleled ML inference and training...
AI/ML Engineer

3 days ago

Toronto, Ontario, Canada Confiar Services LLC Full time

Job Title: AI/ML EngineerLocation: Toronto, CanadaDuration: Long Term ContractKey ResponsibilitiesDevelop and maintain Python applications using MCP (Model Context Protocol).Integrate MCP with AI/ML models, APIs, and data pipelines.Build scalable real-time and batch AI workflow services.Work with data scientists/ML engineers to deliver AI-driven...
Sr. ML Performance Engineer, AWS Neuron, Annapurna Labs

5 hours ago

Toronto, Ontario, Canada Amazon Web Services (AWS) Full time

DescriptionThe Annapurna Labs team at Amazon Web Services (AWS) builds AWS Neuron, the software development kit used to accelerate deep learning and GenAI workloads on Amazon's custom machine learning accelerators, Inferentia and Trainium.The Product: The AWS Machine Learning accelerators (Inferentia/Trainium) offer unparalleled ML inference and training...

Americas

Europe

Asia / Oceania

Africa

Senior ML Performance Engineer