Distributed Deep Learning System

Cross-listed course, 16:332:579:08, 14:332:446:06, 2024

This is a cross-listed course with the focus of system perspectives in distributed deep learning. The goal of this course is to develop comprehensive and deep understanding of internals of deep learning systems to inspire and foster students’ future research direction. This course covers a wide range of topics of neural network architecture, optimization methods, parallel training paradigms, high-performance computing architecture, and communication algorithms. This course conveys the principles of distributed/parallel system design with the state-of-the-art deep learning progress.

This course is project based. Students will work in teams and propose their own project idea. Students will get a chance to run with 100s of A100 GPUs on NERSC Perlmutter and NCSA Delta Supercomputers.

Examples projects in Fall 2023 are:

  • Enhancing BLIP-2: Implementing Multi-GPU Training
  • Efficient Low-Rank Training of Transformers
  • Training an End to End Autonomous Driving Model
  • LLM Collaborations in Scriptwriting
  • Large Model for Image Segmentation

Prerequisite:

  • Prior Python and C programming experience is required
  • Basic knowledge of linear algebra (at the level of 01:640:250 - Introductory Linear Algebra)
  • Basic knowledge of calculus (at the level of 01:640:251 - Multivariable Calculus)
  • Basic knowledge of parallel/distributed programming is recommended
  • Basic knowledge of high-performance computing architecture is recommended
  • Prior CUDA knowledge is recommended

Topics:

  • Deep Learning Overview
  • Distributed Training Paradigms
  • Large-scale Optimization – Alogirhtms
  • Large-scale Optimization – Systems
  • Performance Profiling and Modeling
  • High-performance Computing Architecture
  • Sparsification in Deep Learning
  • I/O in Deep Learning