Nature, Newswise, and 1 more
TechRadar, Dataconomy
Interesting Engineering, InsideEVs

The Verge, Tom's Hardware, and 33 more

Wired, CNET, and 36 more

Ars Technica, TechCrunch, and 15 more

Ars Technica, TechCrunch, and 33 more

AP, CNBC, and 4 more

Ars Technica, TechCrunch, and 22 more
Humanity's Last Exam
Humanity's Last Exam is a benchmark featuring 2,500 extremely difficult, expert-level questions across subjects like advanced math, physics, and biology. It was created to truly test AI capabilities as older benchmarks became too easy.
Nature, Newswise, and 1 more
TechRadar, Dataconomy
Interesting Engineering, InsideEVs

TechCrunch, Wired, and 20 more

The Verge, Tom's Hardware, and 33 more

Wired, CNET, and 36 more

Ars Technica, TechCrunch, and 15 more

Ars Technica, TechCrunch, and 33 more

AP, CNBC, and 4 more

Ars Technica, TechCrunch, and 22 more
Humanity's Last Exam
Humanity's Last Exam is a benchmark featuring 2,500 extremely difficult, expert-level questions across subjects like advanced math, physics, and biology. It was created to truly test AI capabilities as older benchmarks became too easy.