School of Information Sciences Undergrad MASTER CALENDAR

The iSchool offers a number of events related to career and professional development, technology and information talks, research seminars, field trips, alumni panels, socials, and more. We also promote relevant opportunities on and around campus. 

We encourage students to also visit additional calendars and websites:

HandShake EventsResearch ParkNCSATechnology Entrepreneur Center (TEC),

The Career Center, Office of Undergraduate Research, Leadership Center, Siebel Center for Design,

Office of Technology Management, Center for Innovation in Teaching & Learning,

Applied Technologies for Learning in the Arts & Sciences.

National & International Scholar's Programs, Student Wellness

iSchool Calendars: Study Abroad Hours, iSchool Events, Non-iSchool Events

BSIS ICT Sessions, Express Advising

Mar 24, 2026   3:00 - 5:00 pm  
University of Illinois Urbana‑Champaign, ECE Building (room TBA)
Sponsor
NCSA - Center for Artificial Intelligence Innovation
Speaker
Priyam Mazumdar
Contact
Shannon Bradley
E-Mail
sbrad77@illinois.edu
Phone
217-778-0887
Originating Calendar
Center for Artificial Intelligence Innovation

The Center for Artificial Intelligence Innovation is hosting a new hands‑on training series this Spring, “GPU Programming with Triton: From NumPy to Flash Attention.” This multi‑week workshop introduces participants to Triton, an open‑source language and compiler that makes it possible to write custom GPU kernels with a Pythonic feel. Triton bridges the gap between familiar NumPy‑style operations and the high‑performance kernels used in modern deep learning systems, including techniques like Flash Attention.

Across the series, attendees will learn how Triton enables fine‑grained control over GPU execution while remaining far more approachable than CUDA. By the end, participants will understand how to move from simple array operations to optimized kernels that power state‑of‑the‑art transformer models.

Don’t worry about needing lots of prerequisites – each kernel we learn to write in Triton will be in two steps. First we will write the pseudocode in Numpy and then follow it up by translating to the Triton kernel. This way the only prereq for everyone interested is just Numpy!

The seminar will be taught by Priyam Mazumdar, a PhD student in Electrical and Computer Engineering and a researcher at the National Center for Supercomputing Applications (NCSA) at the University of Illinois.

Schedule & Format

Tuesdays, 3–5 PM, beginning February 17, 2026 and running thru April 7, 2026
Location: University of Illinois Urbana‑Champaign, ECE Building (room TBA)
Format: Hybrid — in‑person and via Zoom (link forthcoming)

Lesson Outline for 8 Sessions

  1. GPU Programming Fundamentals
    • An overview of GPU vs. CPU architectures, why Triton exists, and how it compares to CUDA, NumPy, and Numba.
  2. Writing our First Kernel: Vector Sum
    • Introduction to the CUDA execution model, including grid and block structure, pointer-based memory access, and a simple vector summation kernel.
    • Transitioning from elementwise kernels to blockwise computation, with a focus on performance implications and scheduling costs.
  3. Matrix Multiplication
    • Matrix multiplication as the core operation underlying most deep learning workloads, implemented step by step in Triton.
  4. Matrix Multiplication with Cache Optimizations
    • Techniques for improving performance on large matrix multiplications, including cache-aware strategies to approach cuDNN-level efficiency.
  5. Fused Softmax
    • Understanding GPU memory overhead and how kernel fusion can significantly improve performance.
  6. Fused Online Softmax
    • Iterative softmax computation techniques commonly used for large-scale data processing.
  7. Flash Attention –
    • A deep dive into FlashAttention, which underpins most modern LLMs. We will reinterpret the attention mechanism through the lens of online softmax and complete a full implementation while covering all critical details.
  8. Flash Attention – (continued)
    • End goal is to match the performance of Torch SDPA using our own custom kernel!
link for robots only