Research Seminars @ Illinois

View Full Calendar

Tailored for undergraduate researchers, this calendar is a curated list of research seminars at the University of Illinois. Explore the diverse world of research and expand your knowledge through engaging sessions designed to inspire and enlighten.

To have your events added or removed from this calendar, please contact OUR at ugresearch@illinois.edu

On the Brittleness of Evaluation in NLP

Event Type
Lecture
Sponsor
Tal August
Location
Siebel Center 2405
Date
Sep 10, 2025   3:00 pm   4:00 pm
Speaker
Gabriel Stanovsky
Contact
Tal August
E-Mail
taugust@illinois.edu
Views
85
Originating Calendar
Siebel School Speakers Calendar

Abstract: 

Large language models are commonly evaluated against several popular benchmarks, including HELM, MMLU or BIG-bench, all of which rely on a single prompt template per task. I will begin by presenting our recent large-scale statistical analysis of over more than 250M samples, showing that minimal prompt paraphrases lead to drastic changes in both absolute performance and relative ranking of different LLMs. These results call into question many of the recent empirical observations about the strengths and weaknesses of LLMs. Following, I will discuss desiderata for a more meaningful evaluation in NLP, leading to our formulation of diverse metrics tailored for different use cases, and conclude with a proposal for a probabilistic benchmarking approach for modern LLMs.

 

link for robots only