Senior Research Scientist, Model Evaluation

3 weeks ago


Toronto, Canada Cohere Full time

Senior Research Scientist, Model Evaluation Who are we? Our mission is to scale intelligence to serve humanity. We’re training and deploying frontier models for developers and enterprises building AI systems that power content generation, semantic search, RAG, and agents. We believe our work is instrumental to the widespread adoption of AI and that each person on the team contributes to increasing the capabilities of our models and the value they bring to customers. Why this role? Evaluation is critical to making progress in scaling intelligence. As models become superhuman in many real-world use cases, we continue to develop new evaluation techniques that accurately reflect current capabilities and set the agenda for future progress. In this role you will create next‑generation evaluation methods and infrastructure to measure LLM progress. Responsibilities Create ambitious new evaluation benchmarks that push the limits of what our models can accomplish. Work cross‑functionally with teams to translate model feedback into trustworthy, repeatable evaluations. Conduct research to advance the state-of-the-art in LLM evaluation methods, including training LLM judges, refining LLM‑based data synthesis pipelines, and improving evaluation efficiency. Build scalable and reusable tools for digging into model performance. Qualifications Rapidly build prototypes that demonstrate LLM boundaries and develop resources to measure those capabilities. Have spent significant time reviewing complex data and LLM outputs to ensure high data quality. Are obsessive about rigorously measuring AI capabilities and ensuring measurements align with desired outcomes. Have strong software engineering skills. If some of the above doesn’t line up perfectly with your experience, we still encourage you to apply Inclusive Hiring We value and celebrate diversity and strive to create an inclusive work environment for all. We welcome applicants from all backgrounds and are committed to providing equal opportunities. Should you require any accommodations during the recruitment process, please submit an Accommodations Request Form, and we will work together to meet your needs. Perks Open and inclusive culture and work environment. Work closely with a team on the cutting edge of AI research. Weekly lunch stipend, in‑office lunches & snacks. Full health and dental benefits, including a separate budget for mental health. 100% parental leave top‑up for up to 6 months. Personal enrichment benefits towards arts, culture, fitness, well‑being, quality time, and workspace improvement. Remote‑flexible offices in Toronto, New York, San Francisco, London, and Paris, plus a co‑working stipend. 6 weeks of vacation (30 working days). Seniority Level Mid‑Senior level Employment Type Full‑time Job Function Other. Industries: Software Development #J-18808-Ljbffr



  • Toronto, Canada Cohere Full time

    Senior Research Scientist, Model Evaluation Who are we? Our mission is to scale intelligence to serve humanity. We’re training and deploying frontier models for developers and enterprises building AI systems that power content generation, semantic search, RAG, and agents. We believe our work is instrumental to the widespread adoption of AI and that each...


  • Toronto, Canada Cohere Full time

    Overview Our mission is to scale intelligence to serve humanity. We’re training and deploying frontier models for developers and enterprises who are building AI systems to power magical experiences like content generation, semantic search, RAG, and agents. We believe that our work is instrumental to the widespread adoption of AI. Cohere is a team of...


  • Toronto, Canada Cohere Full time

    Overview Our mission is to scale intelligence to serve humanity. We’re training and deploying frontier models for developers and enterprises who are building AI systems to power magical experiences like content generation, semantic search, RAG, and agents. We believe that our work is instrumental to the widespread adoption of AI. Cohere is a team of...


  • Toronto, Montreal, Calgary, Vancouver, Edmonton, Old Toronto, Ottawa, Mississauga, Quebec, Winnipeg, Halifax, Saskatoon, Burnaby, Hamilton, Victoria, Surrey, Halton Hills, London, Regina, Markham, Brampton, Vaughan, Kelowna, Laval, Southwestern Ontario, R, Canada Cohere Full time

    Senior Research Scientist, Model Evaluation Who are we? Our mission is to scale intelligence to serve humanity. We’re training and deploying frontier models for developers and enterprises building AI systems that power content generation, semantic search, RAG, and agents. We believe our work is instrumental to the widespread adoption of AI and that each...


  • Toronto, Ontario, Canada Cohere Full time $120,000 - $180,000 per year

    Who are we?Our mission is to scale intelligence to serve humanity. We're training and deploying frontier models for developers and enterprises who are building AI systems to power magical experiences like content generation, semantic search, RAG, and agents. We believe that our work is instrumental to the widespread adoption of AI.We obsess over what we...


  • Toronto, Ontario, Canada Cohere Full time $120,000 - $180,000 per year

    Who are we?Our mission is to scale intelligence to serve humanity. We're training and deploying frontier models for developers and enterprises who are building AI systems to power magical experiences like content generation, semantic search, RAG, and agents. We believe that our work is instrumental to the widespread adoption of AI.We obsess over what we...


  • Toronto, Ontario, Canada Cohere Full time

    Who are we?Our mission is to scale intelligence to serve humanity. We're training and deploying frontier models for developers and enterprises who are building AI systems to power magical experiences like content generation, semantic search, RAG, and agents. We believe that our work is instrumental to the widespread adoption of AI.We obsess over what we...


  • Toronto, Ontario, Canada Cohere Full time $120,000 - $180,000 per year

    Who are we? Our mission is to scale intelligence to serve humanity. We're training and deploying frontier models for developers and enterprises who are building AI systems to power magical experiences like content generation, semantic search, RAG, and agents. We believe that our work is instrumental to the widespread adoption of AI. We obsess over what we...


  • Toronto, Canada The Rundown AI, Inc. Full time

    OverviewWho are we?Our mission is to scale intelligence to serve humanity. We’re training and deploying frontier models for developers and enterprises who are building AI systems to power magical experiences like content generation, semantic search, RAG, and agents. We believe that our work is instrumental to the widespread adoption of AI. We obsess over...


  • Toronto, Canada The Rundown AI, Inc. Full time

    OverviewWho are we?Our mission is to scale intelligence to serve humanity. We’re training and deploying frontier models for developers and enterprises who are building AI systems to power magical experiences like content generation, semantic search, RAG, and agents. We believe that our work is instrumental to the widespread adoption of AI. We obsess over...