Is Model Compression Always Harmful to the Performance of Neural Networks?
Talk title: Is Model Compression Always Harmful to the Performance of Neural Networks?
Date & time: January 22nd, 2022 Saturday, 4pm-5pm PST
Venue: Zoom (Eventbrite registration required)
Eventbrite link: tinyurl.com/cie-sf-0122
Speaker: Qing Jin
The great success of deep learning is accompanied by exploding model size and computational cost of deep models. As an example, GPT-3 requires 175 billion parameters. This makes model compression and acceleration critical for application of deep neural networks on resource-constrained and speed-demanding platforms, especially for those with limited energy and stringent memory budget, such as DSP or IoT devices. In this talk, we will talk about modern techniques to compress deep neural networks, including model quantization for image classification task and network pruning for generative models. We will talk about conventional methods of neural network quantization and their limitations, followed by more recent progress, including advanced methods of adaptive bit-widths and mixed-precision quantization. After that, we discuss network pruning for generative models, which are typically much larger and more difficult to compress in comparison with those for image classification. Finally, we will introduce knowledge distillation which is a powerful method to enhance the performance of small-size models. We hope this will give a basic understanding of the compression techniques for deep models.
Qing Jin is currently a PhD student in Northeastern University working on deep learning acceleration and software-hardware co-design for innovative computing systems. Qing obtained his Bachelor and Master degrees from Nankai University, both in Microelectronics, and Master degree in Computer Engineering from Texas A&M University. He has been working on RF integrated circuit design as well as low-power analog circuit design in the early years, and work on computer vision and deep learning algorithms in recent days. He has been a research intern in several companies including ByteDance, SenseBrain, Kwai and Snap, working on model compression, neural network quantization, and neural architecture search (NAS) for deep neural networks. His work has been published in top-tier conferences and journals in a broad area including computer vision, machine learning, as well as solid-state circuits, such as CVPR, ICML, NeurIPS, AAAI, BMVC, CICC, ISLPED, and JSSC. He is a co-recipient of Best Student Paper Award Finalist of IEEE Custom Integrated Circuits Conference (CICC). He is interested in both practical design and theoretical analysis related to deep learning.