OAK

An Energy-Efficient CMOS Stochastic Computing Neuron for Deep Learning Accelerators

Metadata Downloads
Author(s)
Injune Yeo
Type
Thesis
Degree
Doctor
Department
대학원 전기전자컴퓨터공학부
Advisor
Lee, Byung-geun
Abstract
With the coming of the ‘Big Data’ era, leveraging the deep neural networks
(NN) and machine learning (ML) for analytics processes has received a lot of attention.
Also, development demand for an energy-efficient computation platform to deal with a
lot of those data has been arisen. In conventional CPUs and GPUs, which have a
physically separated between the processing unit and memory device, memory
accessing is the most expensive operation in terms of both power and speed. This
problem becomes more severe in ML/NN training where read/write memory requests
occur frequently. In order to address this issue, many researches have been conducted
on neuro-inspired architecture such as neuromorphic computing, and in-memory
processing, whereby the processing unit is included in memory or memory itself is
exploited for performing computation. This approach is expected to improve the energyefficiency and speed by minimizing data movement.
Recently, by exploiting nanoscale complementary metal oxide semiconductor
(CMOS) technology, TrueNorth, Eyeriss, CxQuad and several CNN/DNN accelerators
have been developed. However, in the CMOS-based approach for large scale NN still
not enough due to on-chip memory capacity, leakage current. On the other hand, for
approach of using emerging non-volatile memories device, such as phase change RAM, resistive RAM (ReRAM), conductive-bridging RAM, and spin-transfer torque magnetic
RAM offer distinct benefits over CMOS-based one. These technologies are
manufacturable, high density, low power consumption, low cost, and, most importantly,
scalable.
We will commonly denote these multi-levels resistive state device with crossbar
array structure to a resistive processing unit (RPU) for simplicity. Its lattice like
configuration similar to the organization of NNs helps to perform massive parallel
vector matrix multiplication (VMM) and accumulation with O(1) complexity in analog
fashion. Since, VMM is a core computation in across the ML/NN and many signal
processing, RPU-based computing platform significantly contributes to boosting
energy-efficiency. Also, even though computation in analog has their own inherent
concerns on PVT variation, low dynamic range, and noise, RPU-based computing
platform has demonstrated accurate performance in approximation computing under sub
8-bits precision.
In an effort to keep the computational cost low, new mixed-signal techniques for
RPU-based NN are proposed. First, a circuit technique and training algorithm that
minimizes the effect of stuck-at-faults within an RPU crossbar array of neural networks
are presented. To improve network performance in the presence of stuck-at-faults, a
conventional trans-impedance amplifier, which is used for summing the currents that
flow through the memristors, is modified to ensure that the amplifier output is within
the appropriate operating range.
Second, A power and area efficient CMOS stochastic neuron for RPU-based
neural networks is presented. The stochastic neuron performs both quantization and
activation function simultaneously by using a single dynamic comparator and allowsiii
power-hungry analog to digital and digital to analog converters to be removed at the
cost of the increased computation time. A network learning method utilizing a noisy
sigmoid function is also presented to minimize the computation time with little accuracy
degradation.
Third, a hardware- and power-efficient RRAM-based online learning neural
network capable of online learning is presented. The network is modularized in
consideration of scalability and consists of 11 modules. Each module comprises two
25×25 RRAM crossbar arrays, four analog multiplexers, and one stochastic neuron chip.
The impact of initialization of the RRAM device on the network performance is
analyzed, and a fast yet effective method of initialization is also presented.
Fourth, a CMOS-based resistive computing element (RCE), which can be
integrated in a crossbar array, is presented. The RCE successfully solves the hardware
constraints of the emerging memristive RPU such as dynamic ranges of conductance, IV nonlinearity, and on/off ratio without increasing hardware complexity compared to
other CMOS implementations.
URI
https://scholar.gist.ac.kr/handle/local/32990
Fulltext
http://gist.dcollection.net/common/orgView/200000908884
공개 및 라이선스
  • 공개 구분공개
파일 목록
  • 관련 파일이 존재하지 않습니다.

Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.