Chang Won Lee

January 10th, 2011 No comments

Chang Won Lee participated in U-WURF 2010. His wiki page.

Categories: People Tags:

Byung Jun Ahn

January 10th, 2011 No comments

Byung Jun Ahn participated in U-WURF 2010. His wiki page.

Categories: People Tags:

Seminar: CUDA programming (Fri 1/14, E205)

January 3rd, 2011 No comments

Introduction to GPU Super-computing Programming.

This 3+ hour seminar will cover the following topics:

1. What is GPU Programming

2. Scalar Processor vs. Vector Processor

3. Clustering/Clustered Computer programming

4. Vector Programming

5. CUDA

6. OpenCL

When & where: Fri 1/14, 2 ~ 6 pm. @ E205

Categories: News Tags:

Pure Software Approach to Reducing Transient Faults in Register Files

January 3rd, 2011 No comments

Device miniaturization is causing significant problems in semiconductor reliability. One particularly nasty problem is what is called transient fault — transient as opposed to permanent because these kinds of faults or errors happen only temporarily. So you may experience this kind of error one time, but you may not experience the same error when you do the same operation again, thus no reproducibility. This can pose a very serious challenge to “testing”, and equally challenging is how to mitigate the effects of such transient errors at runtime. The question that is traditionally asked is i) how to detect such errors and ii) how to correct computation once they are detected.

A very different approach to the same problem is, to try to reduce the rate of such errors.. say to 1/100 times, because if the errors happen very rarely it may not be a problem. This can be done as easily as by recompiling the program… Sounds intriguing? For m0re detail, please check this out: “A compiler optimization to reduce soft errors in register files,” ACM SIGPLAN Notices, Vol. 44, No. 7, pp. 41-49, by Jongeun Lee and Aviral Shrivastava, 2009.

Register file (RF) is extremely vulnerable to soft errors, and traditional redundancy based schemes to protect the RF are prohibitive not only because RF is often in the timing critical path of the processor, but also since it is one of the hottest blocks on the chip, and therefore adding any extra circuitry to it is not desirable.  Pure software approaches would be ideal in this case, but previous approaches that are based on program duplication have very significant runtime overheads, and others based on instruction scheduling are only moderately effective due to local scope.  We show that the problem of protecting registers inherently requires inter-procedural analysis, and intra-procedural optimization are ineffective.  This paper presents a pure compiler approach, based on inter-procedural code analysis to reduce the vulnerability of registers by temporarily writing live variables to protected memory.  We formulate the problem as an integer linear programming problem and also present a very efficient heuristic algorithm.  Our experiments demonstrate that our proposed technique can reduce the vulnerability of the RF by 33~37% on average and up to 66%, with a small 2% increase in runtime.  In addition, our overhead reduction optimizations can effectively reduce the code size overhead, by more than 40% on average, to a mere 5~6%, as compared to highly optimized binaries.

Dynamic vs. static view of a program, used to analyze the effect of compilation on the transient errors.

Dynamic vs. Static view of a program. Transient error can be best defined/understood in the dynamic view (left) of the program, but compilers can only see the static view (right), thus the challenge of this approach.

Categories: Publications Tags:

[LaTeX] Last page column equalization

December 16th, 2010 No comments

The best method, which is also recommended by Michael Shell, is to use this:

\enlargethispage{-X.Yin}

somewhere at the top of the first column of the last page. The last page gets effectively shortened by the “X.Yin” amount.

Categories: Tools and Tips Tags:

HPC and the Excluded Middle | blog@CACM | Communications of the ACM

November 23rd, 2010 No comments

HPC and the Excluded Middle

By Daniel Reed October 24, 2010

I have repeatedly been told by both business leaders and academic researchers that they want “turnkey” HPC solutions that have the simplicity of desktop tools but the power of massively parallel computing. Such desktop tools would allow non-experts to create complex models quickly and easily, evaluate those models in parallel, and correlate the results with experimental and observational data. Unlike ultra-high-performance computing, this is about maximizing human productivity rather than obtaining the largest fraction of possible HPC platform performance. Most often, users will trade hardware performance for simplicity and convenience. This is an opportunity and a challenge, an opportunity to create domain-specific tools with high expressivity and a challenge to translate the output of those tools into efficient, parallel computations.

via HPC and the Excluded Middle | blog@CACM | Communications of the ACM.

Categories: Emerging Topics Tags:

Ten Commandments for Good Teaching

November 23rd, 2010 No comments

By Yale Patt. Good for future instructors as well as good students.

My Ten Commandments for Good Teaching – Know the material – Want to teach – Genuinely respect your students and show it – Set the bar high; students will measure up – Emphasize understanding; de-emphasize memorization – Take responsibility for what is covered – Dont even try to cover the material – Encourage interruptions; dont be afraid to digress – Dont forget those three little words – Reserved for future use

via Ten Commandments for Good Teaching.

Categories: Uncategorized Tags:

Paper accepted for DATE 2012 in Grenoble, France

November 21st, 2010 No comments

The HPC lab will present a paper in DATE (Design Automation and Test in Europe) — a premier conference on design automation. The title of the paper is “I2CRF: Incremental Interconnect Customization for Embedded Reconfigurable Fabrics”, and is about how to do design specialization to exploit the characteristics of application domain in the context of reconfigurable computing.

Conference website:   http://www.date-conference.com/

Categories: News Tags:

Feds Plot Near Human Robot Docs, Farmers, Troops | News | Communications of the ACM

October 25th, 2010 No comments

Feds Plot Near Human Robot Docs, Farmers, Troops

Maybe robot is the next big thing after five decades of IT revolution?

via Feds Plot Near Human Robot Docs, Farmers, Troops | News | Communications of the ACM (summary).

via WIRED (original post)

Categories: Uncategorized Tags:

Memory-Aware Mapping for Reconfigurable Architectures

October 13th, 2010 No comments

We presented a first approach to optimizing software for the memory architecture of the target reconfigurable computing system, in HiPEAC 2010, held in Pisa, Italy. The motivation of this work is that often in multimedia applications the performance bottleneck is in data transfer, not in computation per se. While previously we have tried only to maximize computation rate only, it may be better to sacrifice computation rate a little to increase data transfer rate if there is a trade-off between computation and data transfer as in the case of CGRA mapping. We targeted our compiler for a ADRES-like architecture, with a slightly simplified local memory subsystem:  double-buffered, multi-banked, and the banks of the local memory are one-to-one mapped to the load-store units of the reconfigurable architecture.

Coarse-Grained Reconfigurable Arrays (CGRAs) are a very promising platform, providing both, up to 10-100 MOps/mW of power efficiency and are software programmable. However, this cardinal promise of CGRAs critically hinges on the effectiveness of application mapping onto CGRA platforms. While previous solutions have greatly improved the computation speed, they have largely ignored the impact of the local memory architecture on the achievable power and performance. This paper motivates the need for memory-aware application mapping for CGRAs, and proposes an effective solution for application mapping that considers the effects of various memory architecture parameters including the number of banks, local memory size, and the communication bandwidth between the local memory and the external main memory. Our proposed solution achieves 62% reduction in the energy-delay product, which factors into about 47% and 28% reduction in the energy consumption and runtime, respectively, as compared to memory-unaware mapping for realistic local memory architectures. We also show that our scheme scales across a range of applications, and memory parameters.

Read the full paper: “Memory-Aware Application Mapping on Coarse-Grained Reconfigurable Arrays,” Lecture Notes in Computer Science (HiPEAC ’10), Vol. 5952, pp. 171-185, by Yongjoo Kim, Jongeun Lee, Aviral Shrivastava, Jonghee W. Yoon and Yunheung Paek, 2010.

Categories: Publications Tags: , ,