Benchmark Tests in Math

A benchmarking controversy exposes industry-wide problems when it turns out OpenAI helped design the test that its vaunted o3 ...

When AI passes this test, look out | Commentary

Humanity’s Last Exam is the brainchild of Dan Hendrycks, a well-known AI safety researcher and director of the Center for AI ...

'Humanity's Last Exam' benchmark is stumping top AI models - can you do any better?

A new academic benchmark aims to 'test the limits of AI knowledge at the frontiers of human expertise.' So far, these LLMs ...

US test scores remain below pre-COVID, performance gap widens

US student test scores in reading and math remain below pre-pandemic levels as a worrying gap continues to widen between high ...

Computing19d

Leading AI models accused of cheating benchmark tests

Analysts have evidence suggesting that several state-of-the-art AI models can reproduce test ... benchmarks such as MMLU (Massive Multitask Language Understanding) and GSM8K (Grade School Math ...

10dOpinion

Opinion: When AI passes this test, look out

If you’re looking for a new reason to be nervous about artificial intelligence, try this: Some of the smartest humans in the ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results