High Performance Distributed Deep Learning: A Beginner’s Guide

Date and Time: 
Thursday 2018 Apr 5th
CG Auditorium
DK Panda, Ammar Awan and Hari Subramoni

The current wave of advances in Deep Learning (DL) has led to many exciting challenges and opportunities for Computer Science and Artificial Intelligence researchers alike. Modern DL frameworks like Caffe/Caffe2, TensorFlow, CNTK, Torch, and several others have emerged that offer ease of use and flexibility to describe, train, and deploy various types of Deep Neural Networks (DNN) including deep convolutional nets. In this tutorial, we will provide an overview of interesting trends in DL and how cutting-edge hardware architectures are playing a key role in moving the field forward. We will also present an overview of DL frameworks from an architectural as well as a performance standpoint. Most DL frameworks have utilized a single GPU to accelerate the performance of DNN training and inference. However, approaches to parallelizing the process of training are also being actively explored. The DL community has moved along MPI based parallel/distributed training as well. Thus, we will highlight new challenges for MPI runtimes to efficiently support DNN training. We highlight how we have designed efficient communication primitives in MVAPICH2 to support scalable DNN training. Finally, we will discuss how co-design of the OSU-Caffe framework and MVAPICH2 runtime enables scale-out of DNN training to 160 GPUs.

See http://web.cse.ohio-state.edu/~panda.2/ppopp18_dl_tut.html for more details

Speaker Description: 

Ammar Ahmad Awan received his B.S. and M.S.degrees in Computer Science and Engineering from National University of Science and Technology (NUST), Pakistan and Kyung Hee University (KHU), South Korea, respectively. Currently, Ammar is working towards his Ph.D. degree in Computer Science and Engineering at The Ohio State University. His current research focus lies at the intersection of High Performance Computing (HPC) libraries and Deep Learning (DL) frameworks. He previously worked on a Java-based Message Passing Interface (MPI) and nested parallelism with OpenMP and MPI for scientific applications. He has published 14 papers in conferences and journals related to these research areas. He actively contributes to various projects like MVAPICH2-GDR (High Performance MPI for GPU clusters, OMB (OSU Micro Benchmarks), and HiDL (High Performance Deep Learning). He is the lead author of the OSU-Caffe framework (part of HiDL project) that allows efficient distributed training of Deep Neural Networks.

Dhabaleswar K. (DK) Panda is Professor and University Distinguished Scholar of Computer Science and Engineering at The Ohio State University. He leads the Network-Based Computing Research Group.

PDF icon DeepLearning_seaconf18.pdf5.92 MB

Event Category: