CS Compiler Seminar: Hao Ren, "APPROXMLIR: AN ACCURACY-AWARE COMPILER FOR COMPOUND ML SYSTEMS."

- Sponsor
- Compilers, Architecture, and Parallel Computing Research Area
- Speaker
- Hao Ren
- Contact
- Allison Mette
- agk@illinois.edu
- Originating Calendar
- Siebel School Speakers Calendar
Conference: MLSYS 2026
Author(s): Hao Ren, Yi Mu, Sasa MisailovicAbstract: Many compound AI systems are inherently “approximate” because the ML components (e.g. a large language model) are probabilistic models and the non-ML components (e.g. retrieval-augmented generation) are heuristic. Such systems benefit from trading off result quality for improved performance. While extensive work exists on approximating ML and non-ML components individually, the wide deployment of LLMs in compound systems presents significant opportunities for end-to-end, accuracy-aware compilation. However, tailoring approximations across these different components is challenging to implement. This difficulty comes from their reliance on different software stacks for compilation and execution, as well as deployment on different hardware.
To address these issues, we present ApproxMLIR, a reusable accuracy-aware compilation toolchain. ApproxMLIR introduces the approx MLIR dialect that serves as a unified and centralized interface for defining approximations and approx-opt, a reusable MLIR-based optimizer, which applies approximate transformations on ML and non-ML components. We evaluated ApproxMLIR on three compound AI systems, which combine LLMs with information retrieval tasks and tool calling. The evaluation shows that ApproxMLIR can effectively represent many common approximation choices, discover profitable points in the accuracy-performance tradeoff space and consistently achieve higher speedups compared to static approximation strategies.