Benchmark Tests in Math

18hon MSN

AI program plays the long game to solve decades-old math problems

A game of chess requires its players to think several moves ahead, a skill that computer programs have mastered over the ...

The Vicksburg Post7d

Benchmark testing results released for area schools

A total of nine Vicksburg Warren School District (VWSD) schools presented results from the second round of benchmark testing ...

6don MSN

These researchers used NPR Sunday Puzzle questions to benchmark AI ‘reasoning’ models

Researchers used questions from the NPR Sunday Puzzle challenge to build a benchmark to test AI 'reasoning' models.

Houston Landing on MSN22d

Learning Curve: See how Houston students grew in math, reading on STAAR by district

Most Houston-area students hit the state’s threshold for demonstrating growth on the standardized tests, but performance varied by district.

Computing1mon

Leading AI models accused of cheating benchmark tests

Analysts have evidence suggesting that several state-of-the-art AI models can reproduce test ... benchmarks such as MMLU (Massive Multitask Language Understanding) and GSM8K (Grade School Math ...

TechCrunch24d

AI benchmarking organization criticized for waiting to disclose funding from OpenAI

An organization developing math benchmarks for AI didn’t disclose ... had supported the creation of FrontierMath. FrontierMath, a test with expert-level problems designed to measure an AI ...

ZDNet16d

'Humanity's Last Exam' benchmark is stumping top AI models - can you do any better?

The test consists of 3,000 text and multi-modal ... "When I released the MATH benchmark -- a challenging competition mathematics dataset -- in 2021, the best model scored less than 10%; few ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results