General Events

View Full Calendar

HAL Training: Distributed Data Parallel Model Training in PyTorch

Event Type
Conference/Workshop
Sponsor
Center for Artificial Intelligence Innovation
Date
Feb 15, 2023   3:00 - 5:00 pm  
Speaker
Shirui Luo, Research Scientist - NCSA
Views
152
Originating Calendar
Center for Artificial Intelligence Innovation

Center for Artificial Intelligence Innovation at NCSA is organizing online training sessions throughout the Spring 2023 semester to help users to get started with deep learning projects on HAL. These sessions are designed for novice users to learn about the system and start building deep neural network models. To sign up for training, just request a HAL account prior to the training session and mention "spring training" when describing how the system will be used in your project. Trainings will take place every Wednesday during the fall semester from 3-5pm via Zoom. 

Training Link: https://go.ncsa.illinois.edu/CAIIHALTraining

February 15, 2023: Distributed Data Parallel Model Training in PyTorch  - Shirui Luo

Training Overview: 
This tutorial walks through distributed data parallel training in PyTorch via DDP. We will start with a simple non-distributed training job, and end with deploying a training job across several GPUs in a single HAL node. Along the way, you will learn about DDP to accelerate your model training. You will also learn how to monitor GPU status to help profile code performance to fully utilize GPU computing power.

Sessions will be recorded and available on the CAII website after the training. 


link for robots only