ECE610: Special Topics in CE (Advanced Embedded Systems Design)

Table of Contents

Spring 2015

1 Course Information

This is a seminar course. There is no textbook but a reading list for select topics. The instructor will give an introduction to each topic, which will be followed by more in-depth discussion of papers. Students are expected to read the papers before the class, submit a short review (see below), and present main contents of the papers assigned to them. At the end of the course, students are expected to submit proposals in systems research, as a substitute for the final exam.

Jongeun Lee
Office hours
Mon & Wed 17:15–18:15, or by appointment (please email me).
Meeting times & place
Mon & Wed 16:00–17:15 @ EB2 #411
Paper review + Class participation 30%, Paper presentation 30%, Proposal 30%, Attendance 10%.

2 Course Description

The main theme of this course is low-power or energy-efficient embedded systems design. While power has been the first-class design constraint for more than a decade regardless of the scale of a system, in embedded systems design the importance of power has been longer/better recognized. To overcome the power wall, researchers are looking at even unconventional solutions such as extremely low speed operation of transistors and computations that allow erroneous operations in order to gain in energy efficiency, as well as more data flow oriented computing. In this course, we will study state-of-the-art techniques on circuit-level design (e.g., near-threshold voltage computing), approximate and stochastic computing (which may be considered as logic level), and architecture and compiler level techniques such as using highly parallel off-the-shelf or custom processors (e.g., reconfigurable computing).

As such, this course may be relevant to both CE (or CSE) and EE track students, and requires some background on computer architecture, compiler, and (digital) VLSI design. A large part of this course will be dedicated to stochastic computing and reconfigurable computing.

This course will not focus on conventional parallel computing such as OpenMP, MPI, CUDA, etc., which are covered in other graduate/undergraduate-level courses.

3 What to Include in a Review

  1. Main Problem and Contributions of the Paper
  2. Critical Questions
    • Include at least one critical (= nontrivial) question.
    • Quality prevails quantity here.
    • This is the most important part for grading (surprise me!).

4 Topics and Readings (tentative)

4.1 Low-Power & Near-Threshold Computing

  • Hadi Esmaeilzadeh et al., "Power Challenges May End the Multicore Era," Communications of the ACM, Vol. 56, No. 2, pp. 93–102, February, 2013.
  • R. Dreslinski et al., "Near-Threshold Computing: Reclaiming Moore's Law Through Energy Efficient Integrated Circuits," Proceedings of the IEEE, pp. 253–266, 2010.
  • David Fick et al., "Centip3De: A Cluster-Based NTC Architecture With 64 ARM Cortex-M3 Cores in 3D Stacked 130 nm CMOS," IEEE Journal of Solid-State Circuits, Vol. 48, No. 1, January 2013.
  • Y. Turakhia et al., "HaDeS: Architectural Synthesis for Heterogeneous Dark Silicon Chip Multi-processors," DAC, 2013.

4.2 Stochastic Computing

  • B. Brown et al., "Stochastic Neural Computation I: Computational Elements," IEEE Transactions on Computers, 2001.
  • W. Qian et al., "The Synthesis of Robust Polynomial Arithmetic with Stochastic Logic," DAC, 2008.
  • A. Alaghi et al., "A Spectral Transform Approach to Stochastic Circuits," ICCD, 2012.
  • Z. Zhao et al., "A General Design of Stochastic Circuit and Its Synthesis," DATE, 2015.

4.3 Stochastic Computing Applications & Approximate Computing

  • B. Brown et al., "Stochastic Neural Computation II: Soft Competitive Learning," IEEE Transactions on Computers, 2001.
  • A. Alaghi et al., "Stochastic Circuits for Real-Time Image-Processing," DAC, 2013.
    • A. Alaghi et al., "Exploiting Correlation in Stochastic Circuit Design," ICCD, 2013.
  • V. Chippa et al., "StoRM: A Stochastic Recognition and Mining Processor," ISLPED, 2014.
  • V. Chippa et al., "Analysis and Characterization of Inherent Application Resilience for Approximate Computing," DAC, 2013.
  • Tianshi Chen et al., "BenchNN: On the Broad Potential Application Scope of Hardware Neural Network Accelerators," International Symposium on Workload Characterization (IISWC), 2012.
  • Bilel Belhadj et al., "Continuous Real-World Inputs Can Open Up Alternative Accelerator Designs, ISCA, 2013.
  • Yunji Chen et al., "DaDianNao: A Machine-Learning Supercomputer," MICRO, 2014.
  • A. Lingamneni et al., "Improving Energy Gains of Inexact DSP Hardware Through Reciprocative Error Compensation," DAC, 2013.
  • Q. Zhang et al., "ApproxIt: An Approximate Computing Framework for Iterative Methods," DAC, 2014.
  • H. Zhang et al., "Low Power GPGPU Computation with Imprecise Hardware," DAC, 2014.
  • M. Schaffner et al., "An Approximate Computing Technique for Reducing the Complexity of a Direct-Solver for Sparse Linear Systems in Real-Time Video Processing," DAC, 2014.

4.4 Reconfigurable Computing

  • K. Komton and S. Hauck, "Reconfigurable Computing: A Survey of Systems and Software," ACM Computing Survey, 2002. (survey with a focus on FPGA, a little outdated)
  • W. Najjar et al., "FPGA Code Accelerators - The Compiler Perspective," DAC, 2013.
  • J. Cong et al., "Accelerator-Rich Architectures: Opportunities and Progresses," DAC, 2014.
  • C. Zhang et al., "Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks," FPGA, 2015. (focus on architecture)
  • S. Dai et al., "Flushing-Enabled Loop Pipelining for High-Level Synthesis," DAC, 2014. (focus on HLS)
  • F. Liu et al., "CGPA: Coarse-Grained Pipelined Accelerators," DAC, 2014.
  • A. Putnam et al., "A Reconfigurable Fabric for Accelerating Large-Scale Datacenter Services," ISCA, 2014.
  • K. Shi et al., "Datapath Synthesis for Overclocking: Online Arithmetic for Latency-Accuracy Trade-offs," DAC, 2014. (*)
  • B. Bougard et al., "A Coarse-Grained Array Accelerator for Software Defined Radio Baseband Processing," IEEE Micro magazine, 2008. (CGRA architecture)
  • J. Yoon et al., "SPKM: A Novel Graph Drawing based Algorithm for Application Mapping," ASP-DAC, 2008. (spatial mapping for CGRA)
  • H. Park et al., "Edge-centric Modulo Scheduling for Coarse-Grained Reconfigurable Architectures," PACT, 2008. (temporal mapping for CGRA)
  • M. Hamzeh et al., "REGIMap: Register-aware application mapping on Coarse-Grained Reconfigurable Architectures (CGRAs)," DAC, 2013. (more advanced temporal mapping for CGRA)
  • G. Dasika et al., "PEPSC: A Power-Efficient Processor for Scientific Computing," PACT 2011.
  • S. Gupta et al., "Bundled Execution of Recurring Traces for Energy-Efficient General Purpose Processing," MICRO, 2011.
  • G. Venkatesh et al., "QSCORES: Trading Dark Silicon for Scalable Energy Efficiency with Quasi-Specific Cores," MICRO, 2011.
  • Y. Huang et al., "Elastic CGRAs," FPGA, 2013.

5 Proposal Writing

6 Schedule and Lecture Slides

# Date Topic Presenter Note
1 3/2 Introduction   T0-intro.pptx (1.43MiB, updated 03/02)
2 3/4 Parallel workload   Berkeley View (see above)
        T1-workload.pptx (1.71MiB, updated 03/04)
3 3/9 Stochastic computing   T2-SC.pptx (5.58MiB, updated 03/09)
4 3/11 Dark silicon (Esmaeilzadeh 2013) HSim  
5 3/16 Stochastic computing    
6 3/18 Stochastic computing    
7 3/23 NTC (Dreslinski 2010) HLee  
8 3/25 Overcoming dark silicon (Turakhia 2013) SYOh  
9 3/30 SC Neural-net (Brown 2001) ARahman & HSim  
10 4/1 (continuing)    
11 4/6 SC processor (Chippa 2014) JYun  
12 4/8 SC circuit synthesis (Zhao 2015) ARahman  
13 4/13 Machine-learning supercomputer (Chen 2014) JKwak  
14 4/15 ApproxIt (Zhang 2014) DNguyen  
15 4/27 Reconfigurable computing   T3-RC.pptx (3.75MiB, updated 04/28)
16 4/29 Reconfigurable computing    
17 5/4 CGRA: architecture (Bougard 2008) SYOh  
18 5/6 CGRA: mapping (Park 2008) SYOh  
19 5/11 DNN on FPGA (Zhang 2015) ARahman  
20 5/13 Elastic CGRA (Huang 2013) HLee  
21 5/18 Loop pipelining in HLS (Dai 2014) HSim  
22 5/20 CGPA (Liu 2014) DNguyen  
23 5/27 PEPSC for HPC (Dasika 2011) JKwak  
24 6/1 Proposal progress report (presentation)    
25 6/3 FPGAs for Data Centers (Putnam 2014) JYun  
26 6/8 (no class)    
27 6/10 (no class)    
28 6/17 Proposal due (paper submission)    

7 Questions?

Post your questions on the Blackboard - Q&A Forum.

Author: Jongeun Lee

Created: 2015-06-15 Mon 10:00