Reading Group - Agents' Last Exam | SF Tech Events - GarysGuide

COMING UP

AI Engineer World Fair
Jun 29 - Jul 02

Reading Group - Agents' Last Exam

With Yiyou Sun (Postdoctoral Researcher, UC Berkeley), David (Xinyang) Han (PhD, UC Berkeley).

	Jun 29 (Mon) @ 04:00 PM FREE
	Venue, 101 Second St, SF

Join the Snorkel AI Reading Group, a recurring forum to explore the latest frontier developments in AI while building meaningful connections within the community.

In this afternoon session, Yiyou Sun & Xinyang Han, Postdoctoral Researchers at UC Berkeley, will cover their recent paper: Agents' Last Exam.

Agenda:

4 pm - doors open
4:30 pm - talk begins

Boba tea & other refreshments will be provided !

Among other things, you'll learn:

ALE is a benchmark designed to evaluate AI agents on long-horizon, economically valuable, real-world tasks with verifiable outcomes-developed in collaboration with 250+ industry experts & covering 1,000+ tasks across 55 subfields in 13 industry clusters.

Widely-used benchmarks lack sustained performance measurement on real, economically valuable workflows, creating a systematic gap between benchmark success & meaningful deployment across professional domains.

ALE grounds task coverage in O*NET / SOC 2018, the U.S. federal occupational taxonomy, ensuring systematic, reproducible coverage of non-physical job categories at scale.

The hardest task tier remains far from saturated-across mainstream harness & backbone configurations, the average full pass rate is just 2.6%, underscoring the substantial headroom that remains.

ALE's task pool grows continuously as new workflows & industries are onboarded, enabling longitudinal tracking of agent capabilities rather than one-time snapshot comparisons.

ALE is intended not merely as another leaderboard, but as an instrument for closing the gap between benchmark performance & GDP-relevant economic impact.

Agents' Last Exam is a collaboration between UC Berkeley's RDI (Center for Responsible Decentralized Intelligence), Snorkel AI, & 250+ industry experts across academia & industry.