Grainger College of Engineering, All Events

PILOT Seminar: Yinfang Chen, "Agentic AI for Cloud Reliability"

Feb 13, 2026   1:30 - 3:00 pm  
4124 Siebel Center for Computer Science
Sponsor
Siebel School of Computing and Data Science
Originating Calendar
Siebel School PILOT Seminars


Abstract:
Cloud systems are becoming ever more critical infrastructure. Yet failures are the norm in the cloud. Outages and incidents occur every day, and downtime for large-scale systems can cost hundreds of thousands of dollars per hour. Despite massive investments, cloud system reliability today still relies heavily on human engineers. This raises a fundamental research challenge: how can we embed intelligence into systems so they can autonomously detect, diagnose, and recover from failures safely and efficiently?

In this talk, I will present my research on the design of cloud systems that are built with AI across the incident lifecycle. I will first introduce my work on AI-augmented root cause analysis, where large language models can reason over heterogeneous telemetry to localize failures. I will then turn to failure mitigation, where I design reliable and safety-aware agentic systems capable of executing recovery actions automatically. I will conclude by outlining my future research agenda on improving the reliability, efficiency, and security of AI and systems.

Bio:

link for robots only