Toronto’s Vector Institute has released its first “state of evaluation” report comparing 11 leading AI models across 16 benchmarks. The non-profit assessed models from OpenAI, Meta, Cohere, Alibaba, and others on tasks involving math, coding, reasoning, and domain-specific knowledge in areas such as finance and history.
The goal is to provide an unbiased, comprehensive framework to help businesses and policymakers navigate competing claims from AI developers. “We wanted to provide a very objective and more comprehensive evaluation,” said Deval Pandya, Vector’s vice-president of AI engineering. The study could guide AI adoption decisions and inform regulatory understanding.
Want to know more? Check out the source code on The Logic.