Comparing Rubric & Preference Evaluation For Quality Assessment | SF Tech Events - GarysGuide

COMING UP

NY Tech Week
Jun 01 - Jun 07

ETHConf NYC
Jun 07 - Jun 11

SF Deep Tech Week
Jun 21 - Jun 26

Comparing Rubric & Preference Evaluation For Quality Assessment

With Russell Yang (AI Engg Fellow, Stanford Law School).

	Jun 17 (Wed) @ 03:00 PM FREE
	Venue, 101 Second St, SF

Join the Snorkel AI Reading Group, a recurring forum to explore the latest frontier developments in AI while building meaningful connections within the community.

In this afternoon session, Russell Yang, an AI Engineering Fellow at Stanford Law School, will cover his recent paper: JudgmentBench: Comparing Rubric & Preference Evaluation for Quality Assessment.

Agenda:

3pm - doors open
3:30pm - talk begins

Boba tea & other refreshments will be provided !

Among other things, you'll learn:

What JudgmentBench is: 30 real-world legal tasks paired with 1,539 rubric scores & 1,530 pairwise preference judgments, all collected from practicing attorneys (including at major U.S. law firms).

Why it's the first public dataset in a high-expertise domain where both supervision signals are elicited from the same experts on the same items.

Why the choice between rubric scoring & comparative judgment is rarely justified, even though both dominate current benchmarking.

How comparative judgments recover the intended quality ordering far better than rubrics: a mean Spearman correlation of 0.908 vs. 0.150, while requiring less than half the annotation time.

Why that pattern holds for both human annotators & LLM autograders.

How the paired dataset opens a broader research agenda on how expert judgment should be elicited, aggregated, & used as supervision in domains without verifiable ground truth.

JudgmentBench is a collaboration between Stanford, Harvey AI, & Snorkel AI.