您的瀏覽器不支援JavaScript語法,網站的部份功能在JavaScript沒有啟用的狀態下無法正常使用。

Institute of Information Science, Academia Sinica

Events

Print

Press Ctrl+P to print from browser

Seminar

:::

[IIS&CITI]Visionary Technology Seminar Series(1) Term Revealing: A New Quantization Approach for Deep Learning

  • LecturerProf. H.T. Kung (Computer Science and Electrical Engineering, Harvard University(USA))
    Host: K.M. Chung, D.N. Yang, Li Su
  • Time2020-09-01 (Tue.) 10:00 ~ 11:30
  • LocationAuditorium106 at IIS new Building(actual); virtual link is available in the abstract
Abstract

Virtual Meeting

ID: 170 978 2671

Password:  GPfq8RBSk23

*This Series of talks are mainly open to staffs of IIS & CITI at Academia Sinica
**IIS reserves the right to determine the attendee's eligibilty for quality control purposes (applies for both actual and virtual meeting)

--------------------------------------------------------------------------------------------------------------------

Quantization is a widely used technique for efficient deep learning computation. However, aggressive post-training quantization is often not possible. E.g., for ImageNet, quantization using fewer than 8 bits would substantially degrade classification accuracy. A problem with conventional quantization is that it indiscriminately truncates lower-order bits of numbers, while these bits are precisely those critical bits that differentiate features.

We propose a new quantization method, "term revealing," to alleviate the problem.  The technique substantially reduces required bits for filter weights and activation data, by dynamically adapting bit usage so that only these bits which are most critical to classification accuracy will be kept.  Term revealing can apply to models with or without quantization-aware training.

To enhance efficiency, we further propose "HESE encoding" (Hybrid Encoding for Signed Expressions) for signed-digit representations involving both positive and negative terms, in contrast to standard unsigned binary representations using only positive terms. E.g., we represent 55 as 26 - 23 - 20 rather than 25 + 24 + 22 + 21 + 20. We describe an efficient one-pass scheme for forming minimum-length signed-digit representations.

This talk focuses on the post-training term revealing. (There is no need for model re-training.)  We evaluate the approach on MLP for MNIST, CNNs for ImageNet, and LSTM for Wikitext-2. We show significant reductions in inference computations (between 3-10x) compared to conventional quantization for the same model accuracy.  (Our recent work in term-quantization-aware training shows similar gains for YOLOv5.) A description of term revealing with source code is accessible from a forthcoming SC20 paper jointly written with Brad McDanel and Sai Zhang of Harvard.

 

Quantization is a widely used technique for efficient deep learning computation. However, aggressive post-training quantization is often not possible. E.g., for ImageNet, quantization using fewer than 8 bits would substantially degrade classification accuracy. A problem with conventional quantization is that it indiscriminately truncates lower-order bits of numbers, while these bits are precisely those critical bits that differentiate features.

We propose a new quantization method, "term revealing," to alleviate the problem.  The technique substantially reduces required bits for filter weights and activation data, by dynamically adapting bit usage so that only these bits which are most critical to classification accuracy will be kept.  Term revealing can apply to models with or without quantization-aware training.

To enhance efficiency, we further propose "HESE encoding" (Hybrid Encoding for Signed Expressions) for signed-digit representations involving both positive and negative terms, in contrast to standard unsigned binary representations using only positive terms. E.g., we represent 55 as 26 - 23 - 20 rather than 25 + 24 + 22 + 21 + 20. We describe an efficient one-pass scheme for forming minimum-length signed-digit representations.

This talk focuses on the post-training term revealing. (There is no need for model re-training.)  We evaluate the approach on MLP for MNIST, CNNs for ImageNet, and LSTM for Wikitext-2. We show significant reductions in inference computations (between 3-10x) compared to conventional quantization for the same model accuracy.  (Our recent work in term-quantization-aware training shows similar gains for YOLOv5.) A description of term revealing with source code is accessible from a forthcoming SC20 paper jointly written with Brad McDanel and Sai Zhang of Harvard.

BIO

H. T. Kung is William H. Gates Professor of Computer Science and Electrical Engineering at Harvard University.  He has pursued a variety of research interests in his career, ranging from computer science theory, parallel computing, VLSI design, database algorithms, computer systems, wireless communications, and networking, to machine learning. His academic honors include Member of National Academy of Engineering (USA), Guggenheim Fellowship, and the ACM SIGOPS 2015 Hall of Fame Award (with John Robinson). He currently serves as volunteer President of a non-profit organization, Taiwan AI Academy , with multiple campuses in Taiwan, that has nurtured thousands of AI talents for the industry