NCSA Calendar

View Full Calendar

NCSA staff who would like to submit an item for the calendar can email newsdesk@ncsa.illinois.edu.

Natural Language Processing Seminar Series: Dr. Abhishek Umrawal, "Trustworthy AI: Causal Control and Intent-Hiding Games in LLMs."

Event Type

Seminar/Symposium

Sponsor

AICE Center

Location

2405 Siebel Center

Date

Nov 19, 2025 1:00 - 2:00 pm

Speaker

Dr. Abhishek K. Umrawal

Contact

Allison Mette

E-Mail

agk@illinois.edu

Originating Calendar

Siebel School Speakers Calendar

Abstract: Large language models (LLMs) have achieved remarkable fluency and versatility, yet they remain fundamentally opaque and vulnerable—posing challenges for both responsible control and safe deployment. This talk presents two complementary approaches to advancing trustworthy AI: one focused on interpretable control, and the other on adversarial robustness. We first introduce JAM (Just A Move), a novel framework for controllable text generation that leverages causal interventions in the latent space of LLMs. By uncovering and manipulating the causal structure underlying generation, JAM enables interpretable and efficient control over model outputs. Empirical evaluations across alignment benchmarks—including HHH criteria, toxicity reduction, and GPT-4 alignment—demonstrate that JAM improves controllability by up to 22% while maintaining computational efficiency. We next examine the vulnerabilities of LLMs through intent-hiding adversarial prompting, a scalable attack strategy that composes benign skills to conceal malicious intent. Using a game-theoretic framework, we analyze the dynamics between attackers and defense systems, revealing structural advantages for adversaries. We further propose targeted defenses and validate their effectiveness across real-world models and malicious behaviors. Together, these contributions highlight the dual imperative of building LLMs that are both controllable by design and resilient to adversarial misuse, offering a roadmap toward more trustworthy and secure AI systems.

Bio: Dr. Abhishek K. Umrawal is a Teaching Assistant Professor in the Department of Electrical and Computer Engineering at Illinois.

link for robots only