Model Evaluation Icon

News

1mon

Figuring out which AI model is right for you is harder than you think

AI models are numerous and confusing to navigate, but the benchmarks used to measure their performance are also challenging.

Variety4mon

Video Generation Model Evaluation in 2025: Veo 2, Sora, Pika 2.0, Ray2

“We’re still in the early days of understanding what a good model is. It’s quite possible that when we made it ready for scale, it did lose some quality in certain areas. But it’s not ...

Business Insider5mon

OpenAI rolls out the full version of o1, its hot reasoning model

Research scientists on the livestream said an internal evaluation indicated it made major mistakes about 34% less often than the o1 preview mode. The model ... Sign up An icon in the shape ...

DLR1y

Earth System Model Evaluation and Analysis (EVA) Department

The evaluation and ensemble analysis of Earth system models is crucial for model improvements and a prerequisite for reliable climate projections of the 21st century to be used as guide-lines for ...

NextBigFuture2y

The Best Open Large Language Models

They evaluate models on 4 key benchmarks from the Eleuther AI Language Model Evaluation Harness , a unified framework to test generative language models on a large number of different evaluation tasks ...

The Verge9mon

OpenAI says its latest GPT-4o model is ‘medium’ risk

alongside external testers listed on OpenAI’s website as Model Evaluation and Threat Research (METR) and Apollo Research, both of which build evaluations for AI systems. Moreover, the company is ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results