CUDA Kernel Optimizer

1 week ago

Toronto, Ontario, Canada MERCOR Full time $32,000 - $64,000 per year

1) Role Overview

Mercor is engaging advanced CUDA experts who specialize in GPU kernel optimization, performance profiling, and numerical efficiency. These professionals possess a deep mental model of how modern GPU architectures execute deep learning workloads. They are comfortable translating algorithmic concepts into finely tuned kernels that maximize throughput while maintaining correctness and reproducibility,

2) Key Responsibilities

Develop, tune, and benchmark CUDA kernels for tensor and operator workloads.
Optimize for occupancy, memory coalescing, instruction-level parallelism, and warp scheduling.
Profile and diagnose performance bottlenecks using Nsight Systems, Nsight Compute, and comparable tools.
Report performance metrics, analyze speedups, and propose architectural improvements.
Collaborate asynchronously with PyTorch Operator Specialists to integrate kernels into production frameworks.
Produce well-documented, reproducible benchmarks and performance write-ups.

3) Ideal Qualifications

Deep expertise in CUDA programming, GPU architecture, and memory optimization.
Proven ability to achieve quantifiable performance improvements across hardware generations.
Proficiency with mixed precision, Tensor Core usage, and low-level numerical stability considerations.
Familiarity with frameworks like PyTorch, TensorFlow, or Triton (not required but beneficial).
Strong communication skills and independent problem-solving ability.
Demonstrated open-source, research, or performance benchmarking contributions.

4) More About the Opportunity

Ideal for independent contractors who thrive in performance-critical, systems-level work.
Engagements focus on measurable, high-impact kernel optimizations and scalability studies.
Work is fully remote and asynchronous; deliverables are outcome-driven.
Access to shared benchmarking infrastructure and reproducibility tooling via Mercor support resources.

5) Compensation & Contract Terms

Typical range: $120–$250/hour, depending on scope, specialization, and results achieved. Payments will be based on accepted task output over flat hourly.
Structured as a contract-based engagement, not an employment relationship.
Compensation tied to measurable deliverables or agreed milestones.
Confidentiality, IP, and NDA terms as defined per engagement.

6) Application Process

Submit a brief overview of prior CUDA optimization experience, profiling results, or performance reports.
Include links to relevant GitHub repos, papers, or benchmarks if available.
Indicate your hourly rate, time availability, and preferred engagement length.
Selected experts may complete a small, paid pilot kernel optimization project

7) About Mercor

Mercor connects domain experts with top AI research and technology organizations through project-based contracts.
Contractors operate independently, with full flexibility over methods, timelines, and tools.
Our mission is to help top engineers and researchers access frontier technical work without rigid employment structures.

ML Kernel Performance Engineer, AWS Neuron, Annapurna Labs

1 day ago

Toronto, Ontario, Canada Amazon Full time $120,000 - $180,000 per year

The Annapurna Labs team at Amazon Web Services (AWS) builds AWS Neuron, the software development kit used to accelerate deep learning and GenAI workloads on Amazon's custom machine learning accelerators, Inferentia and Trainium.The Acceleration Kernel Library team is at the forefront of maximizing performance for AWS's custom ML accelerators. Working at the...
Member of Technical Staff, Training and Inference

2 weeks ago

Toronto, Ontario, Canada Boson AI Full time $120,000 - $180,000 per year

Boson AI is an early-stage startup building large audio models for everyone to enjoy and use. Our founders (Alex Smola,Mu Li), and a team of Deep Learning, Optimization, NLP, and Statistics scientists and engineers are working on high quality generative AI models for language and beyond.We are seeking research scientists and engineers to join our team...
ML Kernel Performance Engineer, AWS Neuron, Annapurna Labs

1 day ago

Toronto, Ontario, Canada Amazon Full time $120,000 - $180,000 per year

DESCRIPTIONThe Annapurna Labs team at Amazon Web Services (AWS) builds AWS Neuron, the software development kit used to accelerate deep learning and GenAI workloads on Amazon's custom machine learning accelerators, Inferentia and Trainium. The Acceleration Kernel Library team is at the forefront of maximizing performance for AWS's custom ML accelerators....
Member of Technical Staff, Training and Inference

1 week ago

Toronto, Ontario, Canada Boson AI Full time $150,000 - $600,000 per year

Boson AI is an early-stage startup building large audio models for everyone to enjoy and use. Our founders (Alex Smola,Mu Li), and a team of Deep Learning, Optimization, NLP, and Statistics scientists and engineers are working on high quality generative AI models for language and beyond. We are seeking research scientists and engineers to join our team...
PyTorch Operator

2 weeks ago

Toronto, Ontario, Canada Mercor Full time US$20,800 - US$104,000 per year

1) Role Overview Mercor is seeking experienced PyTorch experts who excel in extending and customizing the framework at the operator level. Ideal contributors are those who deeply understand PyTorch's dispatch system, ATen, autograd mechanics, and C++ extension interfaces. These contractors bridge research concepts and high-performance implementation,...
Member of Technical Staff, Training Performance Engineer

1 week ago

Toronto, Ontario, Canada b3f36179-ea87-4288-8f0b-815554f7141f Full time US$120,000 - US$180,000 per year

Who are we?Our mission is to scale intelligence to serve humanity. We're training and deploying frontier models for developers and enterprises who are building AI systems to power magical experiences like content generation, semantic search, RAG, and agents. We believe that our work is instrumental to the widespread adoption of AI.We obsess over what we...
Machine Learning Engineer Intern

2 weeks ago

Toronto, Ontario, Canada Tenstorrent University Jobs Full time $60,000 - $120,000 per year

Tenstorrent is leading the industry on cutting-edge AI technology, revolutionizing performance expectations, ease of use, and cost efficiency. With AI redefining the computing paradigm, solutions must evolve to unify innovations in software models, compilers, platforms, networking, and semiconductors. Our diverse team of technologists have developed a high...
Machine Learning Engineer

1 week ago

Toronto, Ontario, Canada Taalas Full time $120,000 - $180,000 per year

At Taalas we believe that fundamental progress is achieved by those who are willing to understand and assail a problem end-to-end, without regard for commonly accepted abstractions and boundaries.We are building a team of hands-on technologists who dislike overspecialization and seek to excel in both depth and breadth.In this position the successful...
Member of Technical Staff, Modeling

1 day ago

Toronto, Ontario, Canada Boson AI Full time $150,000 - $600,000 per year

Boson AI is an early-stage startup building large language tools for interaction and entertainment. Our founders, Alex Smola, Mu Li, and a team of Deep Learning, Optimization, NLP, AutoML and Statistics scientists and engineers are working on high quality generative AI models for language and beyond. We are seeking research scientists and engineers to...
Internship - MScAC

3 days ago

Toronto, Ontario, Canada Boson AI Full time $70,000 - $100,000 per year

Boson AI is an early-stage startup of 30 scientists. We are building large language tools for interaction and entertainment. Our founders, Alex Smola, Mu Li, and a team of Deep Learning, Optimization, NLP, AutoML and Statistics scientists and engineers are working on high quality generative AI models for language and beyond. NOTE: Please apply to this role...

Americas

Europe

Asia / Oceania

Africa

CUDA Kernel Optimizer