Page 117 - My FlipBook
P. 117
Brochure 2020

Figure 2 : AI compiler framework.

Hence, an optimizing compiler framework is needed and storage. Replacing DRAM with NVM will reduce the
for deep learning, i.e., an AI compiler. The function of an energy consumption of main memory and, by replacing
AI compiler is to translate deep learning architectures disk drives, we can enhance I/O performance because
and optimize performance according to the capability of NVM’s superior byte-addressability, non-volatility,
of the hardware architecture. That means developing a capacity for scalability, and high access performance.
single optimizing compiler framework that can compile To achieve this goal, we are investigating the designs of
for a variety of hardware architectures. Performance existing memory management, storage, and le systems
portability and extensibility are also key issues in the in di erent operating systems, e.g., Linux. We anticipate
design of such an AI compiler. designing a new in-memory file system that resides
in NVM-based storage, the capacity of which can be
III. Deep Learning with Non-volatile Memories adapted to user data size and access behavior of which
is optimized for NVM. Meanwhile, the page cache system
Deep learning training usually consumes considerable for storage devices can be altered to: (1) minimize data
DRAM space, but DRAM has high power leakage and movement between memory and storage; and (2)
unit costs. Due to these issues with DRAM, we propose maximize NVM lifetime by utilizing the fact that memory
to enable training of neural networks (NNs) on non- and storage reside in the same NVM device and have the
volatile-memory (NVM)-based computer systems. As same address space.
NVMs exhibit low leakage power and high density, we
propose to adopt NVM as the main memory to replace Figure 3 : Approximate computing for deep learning training
DRAM in NN training, which requires huge amounts of with non-volatile memories.
main memory. However, NVMs have asymmetric read/
write features and limited write endurance. To overcome Figure 4 : One-memory architecture.
those limitations for NN training, we are analyzing
NN data flow and data content during NN training. By
utilizing the derived characteristics of data ow and data
content, we propose to apply approximate computing
technologies, including approximate memory and
storage techniques, to improve NN training performance
without losing training accuracy by reducing the cost of
each write and trading some limited precision of written
value. The results of this research could effectively
resolve major problems with NN training without having
to change existing computer architectures. Moreover,
our research will create a whole new paradigm for
computer system design to support deep learning.

IV. One-Memory Computer Systems

Given that data-intensive applications are increasingly
being run on computer systems, it has become critically
important to improve I/O efficiency. In response to the
low access performance of existing storage devices,
enlarging DRAM capacity to support more data-intensive
applications introduces greater energy consumption
and higher hardware costs. Therefore, we propose to
adopt non-volatile memory (NVM) as both main memory

115
   112   113   114   115   116   117   118   119   120   121   122