Benchmark Tests in Math

Benchmark testing results released for area schools

A total of nine Vicksburg Warren School District (VWSD) schools presented results from the second round of benchmark testing ...

decrypt14d

Did OpenAI Cheat on Its Big Math Test?

A benchmarking controversy exposes industry-wide problems when it turns out OpenAI helped design the test that its vaunted o3 ...

2don MSN

These researchers used NPR Sunday Puzzle questions to benchmark AI ‘reasoning’ models

Researchers used questions from the NPR Sunday Puzzle challenge to build a benchmark to test AI 'reasoning' models.

16d

When A.I. Passes This Test, Look Out

The creators of a new test called “Humanity’s Last Exam” argue we may soon lose the ability to create tests hard enough for A ...

Hosted on MSN1mon

6 observations from a devastating international math test

U.S. 4th-grade students on the TIMSS, 1995–2023, by student percentiles Another way of understanding the shrinking middle is to see how few American children met basic math benchmarks. The test ...

Houston Landing on MSN18d

Learning Curve: See how Houston students grew in math, reading on STAAR by district

Most Houston-area students hit the state’s threshold for demonstrating growth on the standardized tests, but performance varied by district.

13d

'Humanity's Last Exam' benchmark is stumping top AI models - can you do any better?

A new academic benchmark aims to 'test the limits of AI knowledge at the frontiers of human expertise.' So far, these LLMs ...

Computing26d

Leading AI models accused of cheating benchmark tests

Analysts have evidence suggesting that several state-of-the-art AI models can reproduce test ... benchmarks such as MMLU (Massive Multitask Language Understanding) and GSM8K (Grade School Math ...

Santa Monica Daily Press5d

Math assessment, grading discrepancy piques school board interest

Santa Monica-Malibu Unified School District officials expressed concern last week over high numbers of failing math grades, with more than 800 secondary students receiving D or F marks in the fall ...

TechCrunch20d

AI benchmarking organization criticized for waiting to disclose funding from OpenAI

An organization developing math benchmarks for AI didn’t disclose ... had supported the creation of FrontierMath. FrontierMath, a test with expert-level problems designed to measure an AI ...

Results that may be inaccessible to you are currently showing.

Hide inaccessible results