|
|
|
|
Lecture | Topic | Reading | Spatial Assignment |
---|
1 | Introduction, role of hardware accelerators in post Dennard and Moore era (硬件加速器在后登纳-摩尔时代作用介绍) | Is Dark silicon useful? Hennessy Patterson Chapter 7.1-7.2 |
|
2 | Classical ML algorithms: Regression, SVMs (What is the
building block?) (经典ML算法:回归,SVMs) | TABLA |
|
3 | Linear algebra fundamentals and accelerating linear algebra BLAS operations
20th century techniques: Systolic arrays and MIMDs, CGRAs (线性代数基础和BLAS加速运算) | Why Systolic Architectures? Anatomy of high performance GEMM | Linear Algebra Accelerators |
4 | Evaluating Performance, Energy efficiency, Parallelism, Locality,
Memory hierarchy, Roofline model (评价性能、能效、并行度、局部性、内存层次结构,Roofline 模型) | Dark Memory |
|
5 | Real-World Architectures: Putting it into practice Accelerating GEMM: Custom, GPU, TPU1 architectures and their GEMM performance (现实世界的架构:将其付诸实践加速GEMM:自定义、GPU、TPU1架构和它们的GEMM性能。) | Google TPU Codesign Tradeoffs NVIDIA Tesla V100 |
|
6 | Neural networks: MLPs and CNNs Inference (神经网络:MLP和CNN推断) | Viviense IEEE proceeding Brooks’s book (Selected Chapters) | CNN Inference Accelerators |
7 | Accelerating Inference for CNNs:
Blocking and Parallelism in practice DianNao, Eyeriss, TPU1 (加速对CNNs的推理:在实践中阻塞和并行。 DianNao、Eyeriss TPU1) | Systematic Approach to Blocking Eyeriss Google TPU (see lecture 5) |
|
8 | Modeling neural networks with Spatial, Analyzing
performance and energy with Spatial (以空间为基础的神经网络建模,分析性能和空间能量) | Spatial One related work |
|
9 | Training: SGD, back propagation, statistical efficiency, batch size (训练:SGD, )反向传播, | NIPS workshop last year Graphcore | Training Accelerators |
10 | Resilience of DNNs: Sparsity and Low Precision Networks (DNNs的弹性能力:稀疏性和低精度网络)
| Some theory paper EIE Flexpoint of Nervana Boris Ginsburg: paper, presentation LSTM Block Compression by Baidu? |
|
11 | Low precision training (低精度训练)
| HALP Ternary or binary networks See Boris Ginsburg's work (lecture 10) |
|
12 | Training in Distributed and Parallel systems:
Hogwild!, asynchrony and hardware efficiency (分布式并行系统训练) | Deep Gradient compression Hogwild! Large Scale Distributed Deep Networks Obstinate cache? |
|
13 | FPGAs and CGRAs: Catapult, Brainwave, Plasticine (FPGA) | Catapult Brainwave Plasticine |
|
14 | ML benchmarks: DAWNbench, MLPerf (机器学习基准) | DawnBench Some other benchmark paper |
|
15 | Project presentations |
|
|