NCSA Calendar

View Full Calendar

NCSA staff who would like to submit an item for the calendar can email newsdesk@ncsa.illinois.edu.

Natural Language Processing Seminar Series: Dr. Abhishek Umrawal, "Trustworthy AI: Causal Control and Intent-Hiding Games in LLMs."

Event Type
Seminar/Symposium
Sponsor
AICE Center
Location
2405 Siebel Center
Date
Nov 19, 2025   1:00 - 2:00 pm  
Speaker
Dr. Abhishek K. Umrawal
Contact
Allison Mette
E-Mail
agk@illinois.edu
Originating Calendar
Siebel School Speakers Calendar

Abstract: Large language models (LLMs) have achieved remarkable fluency and versatility, yet they remain fundamentally opaque and vulnerable—posing challenges for both responsible control and safe deployment. This talk presents two complementary approaches to advancing trustworthy AI: one focused on interpretable control, and the other on adversarial robustness. We first introduce JAM (Just A Move), a novel framework for controllable text generation that leverages causal interventions in the latent space of LLMs. By uncovering and manipulating the causal structure underlying generation, JAM enables interpretable and efficient control over model outputs. Empirical evaluations across alignment benchmarks—including HHH criteria, toxicity reduction, and GPT-4 alignment—demonstrate that JAM improves controllability by up to 22% while maintaining computational efficiency. We next examine the vulnerabilities of LLMs through intent-hiding adversarial prompting, a scalable attack strategy that composes benign skills to conceal malicious intent. Using a game-theoretic framework, we analyze the dynamics between attackers and defense systems, revealing structural advantages for adversaries. We further propose targeted defenses and validate their effectiveness across real-world models and malicious behaviors. Together, these contributions highlight the dual imperative of building LLMs that are both controllable by design and resilient to adversarial misuse, offering a roadmap toward more trustworthy and secure AI systems.

Bio: Dr. Abhishek K. Umrawal is a Teaching Assistant Professor in the Department of Electrical and Computer Engineering at Illinois.

link for robots only