The main research area is Architectures and Compilers for Embedded Systems (ACES) with special emphasis on:
- Heterogeneous parallel computing
- Reconfigurable computing for computation acceleration
- Compilation for multi-core architectures, especially for energy efficiency and reliability
Internet of things is a great vision where everything embeds a tiny computer so that we can listen to them and even talk to them if they have the capacity to understand and act on our words. Undoubtedly there are many challenges to this, but a crucial one is how to make things very very energy-efficient and error-tolerant. In the Embedded Computing Lab at UNIST we are deeply interested in this question of how to make things efficient and resilient, and are exploring solutions across different boundaries (eg., hardware vs software, digital vs analog vs stochastic) encompassing multiple levels of abstraction from application specification to circuit-level design. We push the limit of energy-efficient computing by designing innovative processor architectures and compilers optimized for today’s and tomorrow’s emerging applications (e.g., computer vision, recognition, synthesis) with a special focus on parallelism, heterogeneity, reconfigurability, and energy-accuracy trade-off.
Check out the front page, featuring interesting research outcomes from our lab.
Heterogeneous Parallel Computing (HPC) is about utilizing a set of very different processors such as CPU and GPU efficiently and with ease. The HPC Research Group in ECL lab is actively addressing the following research questions to help realize better heterogeneous parallel computing platforms and applications.
- communication problem
- between CPU and accelerators (GPU, VLIW, loop accelerators, ASIC, etc.)
- memory organization and management problem
- shared memory? cache? scratch-pad memory?
- domain specific architecture
- for example, computer vision or machine learning
- Fast Shared On-Chip Memory Architecture for Efficient Hybrid Computing with CGRAs, Jongeun Lee, Yeonghun Jeong, and Sungsok Seo, Proc. of Design, Automation and Test in Europe (DATE ’13), March, 2013.
- Software-Managed Automatic Data Sharing for Coarse-Grained Reconfigurable Coprocessors, Toan X. Mai and Jongeun Lee*, Proc. of International Conference on Field-Programmable Technology (FPT ’12), pp. 277-284, December, 2012.
- CRM: Configurable Range Memory for Fast Reconfigurable Computing, Jongkyung Paek, Jongeun Lee*, and Kiyoung Choi, Proc. of Reconfigurable Architecture Workshop (RAW ’11), pp. 158-165, May, 2011.
Multi-core and even many-core processors have been successfully used in other domains. Reconfigurable array processors, for instance, have been actively researched and used as an on-chip accelerator for stream processing applications and embedded processors, due to their extremely low power and high performance execution, compared to general purpose processors or even DSPs (digital signal processors).
However, the main challenge in such accelerator-type reconfigurable processors is compilation — the problem of how to map applications onto the architecture. At the heart of this problem is the 2D placement-and-routing problem, which is traditionally recognized as a CAD problem, which is why this problem is often discussed in the design automation communities. Still the problem needs more research and development efforts (such as mature tool chains) for more wide-spread adoption of the architecture.
The ECL Lab is actively pursuing research on this topic, with a few specific goals in mind. We have two granted projects on this, partially in collaboration with other labs.
Figure 1: An accelerator-type reconfigurable processor architecture (Bougard et al. ’08).
- how to compile the usual C programs (“legacy”) onto coarse-grained reconfigurable architectures?
- can there be good architectural solutions (such as architecture extensions) to make it much easier to map programs to these architectures (“compiler-friendly architectures”)?
- what are the real bottleneck to enhancing performance through these processors and how to address them?
- application level mapping problem
- Compiling Control-Intensive Loops for CGRAs with State-Based Full Predication, Kyuseung Han, Kiyoung Choi, and Jongeun Lee, Proc. of Design, Automation and Test in Europe (DATE ’13), March, 2013.
- Architecture Customization of On-Chip Reconfigurable Accelerators, Jonghee W. Yoon, Jongeun Lee*, Sanghyun Park, Yongjoo Kim, Jinyong Lee, Yunheung Paek, and Doosan Cho, ACM Transactions on Design Automation of Electronic Systems (TODAES), 18(4), pp. 52:1-52:22, ACM, October, 2013.
- Improving Performance of Nested Loops on Reconfigurable Array Processors, Yongjoo Kim, Jongeun Lee*, Toan X. Mai, and Yunheung Paek, ACM Transactions on Architecture and Code Optimization (TACO), 8(4), pp. 32:1-32:23, ACM, January, 2012.
- Exploiting Both Pipelining and Data Parallelism with SIMD Reconfigurable Architecture, Yongjoo Kim, Jongeun Lee*, Jinyong Lee, Toan X. Mai, Ingoo Heo, and Yunheung Paek, Proc. of International Symposium on Applied Reconfigurable Computing (ARC ’12), Lecture Notes in Computer Science, vol. 7199, pp. 40-52, March, 2012.
- High Throughput Data Mapping for Coarse-Grained Reconfigurable Architectures, Yongjoo Kim, Jongeun Lee*, Aviral Shrivastava, Jonghee W. Yoon, Doosan Cho, and Yunheung Paek, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD), 30(11), pp. 1599-1609, IEEE, November, 2011.
- Memory Access Optimization in Compilation for Coarse Grained Reconfigurable Architectures, Yongjoo Kim, Jongeun Lee*, Aviral Shrivastava, and Yunheung Paek, ACM Transactions on Design Automation of Electronic Systems (TODAES), 16(4), pp. 42:1-42:27, ACM, October, 2011.
Parallel processing, or multi-core compilation, is rather a challenge than a blessing, since it means the era of free ride is over. When the hardware performance kept doubling every 18 months, the same software could be run twice as before by running it on a new hardware. But now it is no longer true, unless the software can somehow exploit the increased parallelism by the new machine.
There are many challenges including how to compile applications, or in particular, how to map code and data onto various heterogeneous as well as homogeneous processor cores, and manage them efficiently. The management must include aspects of not only performance optimization, but of thermal management, energy and power optimization (such as Dynamic Voltage and Frequency Scaling), and reliability (such as soft error resilience). These multi-dimensional, multi-objective problems require innovative ideas and approaches on architecture, compiler, computer-aided design, operating system, and algorithm levels.
In ECL Lab, we are particularly looking at data management problems for distributed memory architectures such as the Cell processor architecture, where simple scratchpad memories are used instead of power-hungry caches to save power.
Figure 1: Sony/Toshiba/IBM Cell.
- Software-based Register File Vulnerability Reduction for Embedded Processors, Jongeun Lee and Aviral Shrivastava, ACM Transactions on Embedded Computing Systems (TECS), 13(1s), pp. 38:1-38:20, ACM, November, 2013.
- Return Data Interleaving for Multi-Channel Embedded CMPs Systems, Fei Hong, Aviral Shrivastava, and Jongeun Lee, IEEE Transactions on Very Large Scale Integration Systems (TVLSI), 20(7), pp. 1351-1354, IEEE, July, 2012.
- Static Analysis of Register File Vulnerability, Jongeun Lee and Aviral Shrivastava, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD), 30(4), pp. 607-616, IEEE, April, 2011.
- A Compiler-Microarchitecture Hybrid Approach to Soft Error Reduction for Register Files, Jongeun Lee and Aviral Shrivastava, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD), 29(7), pp. 1018-1027, IEEE Press, July, 2010.
- Cache Vulnerability Equations for Protecting Data in Embedded Processor Caches from Soft Errors, Aviral Shrivastava, Jongeun Lee*, and Reiley Jeyapaul, ACM SIGPLAN Notices (LCTES ’10), 45(4), pp. 143-152, April, 2010.
- A Software-Only Solution to Use Scratch Pads for Stack Data, Aviral Shrivastava, Arun Kannan, and Jongeun Lee*, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD), 28(11), pp. 1719-1727, November, 2009.
See Research Archive for more.