Member of Technical Staff, Model Efficiency

4 days ago


Toronto, Canada Cohere Full time

**Who are we?**
- Our mission is to scale intelligence to serve humanity. We’re training and deploying frontier models for developers and enterprises who are building AI systems to power magical experiences like content generation, semantic search, RAG, and agents. We believe that our work is instrumental to the widespread adoption of AI.

We obsess over what we build. Each one of us is responsible for contributing to increasing the capabilities of our models and the value they drive for our customers. We like to work hard and move fast to do what’s best for our customers.

Cohere is a team of researchers, engineers, designers, and more, who are passionate about their craft. Each person is one of the best in the world at what they do. We believe that a diverse range of perspectives is a requirement for building great products.

Join us on our mission and shape the future

**Why this role?**

Large Language Models (LLMs) have demonstrated remarkable performance across various tasks. However, the substantial computational and memory requirements of LLM inference pose challenges for deployment. The model efficiency team is responsible for increasing the inference efficiency of our foundation models by improving model architecture and optimizing ML frameworks.

As an engineer on this team, you’ll work on improving the key model serving metrics including latency and throughput by profiling the system, identifying bottlenecks, and solving problems with innovative solutions.
- Please Note: _We have offices in Toronto, San Francisco, New York and London. We embrace a remote-friendly environment, and as part of this approach, we strategically distribute teams based on interests, expertise, and time zones to promote collaboration and flexibility. You'll find the Model Efficiency team concentrated in the EST and PST time zones.

**You may be a good fit for the Model Efficiency team if you have**:

- Significant experience in developing high-performance machine learning algorithms or machine learning infrastructure
- Hands-on experience with large language models
- Bias for actions and results
- An appetite to solve challenging machine learning research problems

**It is a big plus if you also have considerable experience with one of these areas**:

- Model compression techniques: quantization, pruning, sparsity, low-rank compression, knowledge distillation, etc.
- GPU/Accelerator programming or high-performance computing
- LLM Inference performance modeling
- Machine learning framework internals

If some of the above doesn’t line up perfectly with your experience, we still encourage you to apply If you consider yourself a thoughtful worker, a lifelong learner, and a kind and playful team member, Cohere is the place for you.

We value and celebrate diversity and strive to create an inclusive work environment for all. We welcome applicants of all kinds and are committed to providing an equal opportunity process. Cohere provides accessibility accommodations during the recruitment process. Should you require any accommodation, please let us know and we will work with you to meet your needs.

**Our Perks**:
An open and inclusive culture and work environment
- ‍ Work closely with a team on the cutting edge of AI research
- Weekly lunch stipend, in-office lunches & snacks
- Full health and dental benefits, including a separate budget to take care of your mental health
- 100% Parental Leave top-up for 6 months for employees based in Canada, the US, and the UK
- Personal enrichment benefits towards arts and culture, fitness and well-being, quality time, and workspace improvement
- Remote-flexible, offices in Toronto, New York, San Francisco and London and co-working stipend
- ✈️ 6 weeks of vacation
- Note: This post is co-authored by both Cohere humans and Cohere technology._



  • Toronto, Canada Cohere Full time

    Staff Research Engineer, Model Efficiency Who are we? Our mission is to scale intelligence to serve humanity. We’re training and deploying frontier models for developers and enterprises who are building AI systems to power magical experiences such as content generation, semantic search, RAG, and agents. We believe that our work is instrumental to the...


  • Toronto, Canada Cohere Full time

    Senior Member of Technical Staff Multimodal AIJoin to apply for the Senior Member of Technical Staff Multimodal AI role at CohereSenior Member of Technical Staff Multimodal AIJoin to apply for the Senior Member of Technical Staff Multimodal AI role at CohereWho are we?Our mission is to scale intelligence to serve humanity. We’re training and deploying...


  • Toronto, Canada Cohere Full time

    A leading AI research company in Toronto is looking for a Staff Research Engineer, Model Efficiency to enhance the performance of large language models. This full-time position requires a PhD in Machine Learning and strong software engineering skills. The ideal candidate will develop and deploy techniques to improve model inference efficiency while enjoying...


  • Toronto, Canada Cohere Full time

    A leading AI research company in Toronto is looking for a Staff Research Engineer, Model Efficiency to enhance the performance of large language models. This full-time position requires a PhD in Machine Learning and strong software engineering skills. The ideal candidate will develop and deploy techniques to improve model inference efficiency while enjoying...


  • Toronto, Canada Cohere Full time

    Who are we? Our mission is to scale intelligence to serve humanity. We’re training and deploying frontier models for developers and enterprises who are building AI systems to power magical experiences like content generation, semantic search, RAG, and agents. We believe that our work is instrumental to the widespread adoption of AI. We obsess over what...


  • Toronto, Canada Cohere Full time

    Who are we? Our mission is to scale intelligence to serve humanity. We’re training and deploying frontier models for developers and enterprises who are building AI systems to power magical experiences like content generation, semantic search, RAG, and agents. We believe that our work is instrumental to the widespread adoption of AI. We obsess over what...


  • Toronto, Ontario, Canada Bagel Labs Full time

    We are Bagel Labs, a distributed machine learning research lab working towards open-source superintelligence.We ignore years of experience and pedigree. If you have high agency, meaning your default assumption is that you can control the outcome of whatever situation you are in, we want to hear from you. Every requirement below is flexible for a candidate...

  • Technical Officer

    5 hours ago


    Toronto, Canada Nuclear Waste Management Organization Full time

    OverviewFounded in 2002, the Nuclear Waste Management Organization (NWMO) is a not-for-profit organization tasked with the safe, long-term management of Canada’s intermediate- and high-level radioactive waste, in a manner that protects people and the environment for generations to come.The NWMO has been guided for more than 20 years by a dedicated team of...

  • Technical Officer

    7 hours ago


    Toronto, Canada Nuclear Waste Management Organization Full time

    OverviewFounded in 2002, the Nuclear Waste Management Organization (NWMO) is a not-for-profit organization tasked with the safe, long-term management of Canada’s intermediate- and high-level radioactive waste, in a manner that protects people and the environment for generations to come.The NWMO has been guided for more than 20 years by a dedicated team of...

  • Model Booking Agent

    6 days ago


    Toronto, Canada B&M Model Management Full time

    Are you passionate about the fashion industry and eager to kickstart your career? B&M is seeking a **Model Booking Agent** to join our dynamic team. While prior fashion industry experience is helpful, it’s not required—what matters most is your enthusiasm, creativity, and ability to thrive in a fast-paced environment. What We're Looking For: - A...