Mar 042014

This post originally appeared on the Flux7 blog at: It discusses the relationship between throughput and latency summarized by Little’s Law. Little’s Law has applications on any system with throughput and latency, whether it is memory performance or an industrial assembly line. Read the original article below:

Continue reading “Understanding Throughput and Latency Using Little’s Law” »

Apr 012013

In the second part of the series we introduced memory management and interrupt handling support provided by virtualization hardware extensions. But effective virtualization solutions need to reach beyond the core to communicate with peripheral devices. In this post we discuss the various techniques used for virtualizing I/O, the problems faced, and the hardware solutions to mitigate these problems.

Continue reading “ARM Virtualization – I/O Virtualization (Part 3)” »

Jul 112011

I am sorry for the hiatus. I had some business to take care of and that is why I was unable to write for a few days. I will be writing regularly again. As a come back post, I decided to create a small quiz  on the microrpcoessor industry. It has a few questions about the recent history of microprocessors. I am hoping that you will enjoy the questions and learn from them at the same time. Let us know how you did through your comment!

Continue reading “Quiz: How well do you know CPUs? (Fixed)” »

Jul 082011

Many academic professors talk about parallelism in the context of HDL languages.  Having learned both Verilog and Pthreads, I have always felt that we can use some of the lessons learned in hardware (which is inherently parallel) in Parallel programming. ParC is based on this insight and an impressive piece of work. I learned about ParC through Kevin Cameron’s comments on Future Chips. After some (healthy) debate with Kevin, I felt that ParC is a concept we should all be familiar with. I am hoping that Kevin’s insights will trigger some interesting debate.  

When people say parallel programming is hard they are correct, but to say it is a new problem would be wrong. Hardware designers have been turning out parallel implementations of algorithms for decades. Back in the 1980s designers moved up from designing in gates to using RTL descriptions of circuits with synthesis tools for the hardware description languages (HDLs) Verilog and VHDL. In recent years other methodologies like assertion-driven verification and formal methods have been added to help get chips working at first Silicon.

Continue reading “Guest Post: ParC – Absorbing HDLs into C++” »

Jul 052011

Hardware prefetching is an important performance enhancing feature of today’s microprocessors. It has been shown to improve performance by 10-30% without requiring any programmer effort. However, it is possible for a programmer to throw away this benefit by making naive design decisions. Avoiding the common pitfalls only requires the programmer to have a 10K foot view of how a hardware prefetcher works. Providing this view is the goal of my post.

Continue reading “What programmers need to know about hardware prefetching?” »

Jul 042011

Writing parallel code is all about finding parallelism in an algorithm. What limits parallelism are the dependencies among different code portions. Understanding the dependencies in a program early on can help the programmers (a) determine the amount of available parallelism, and (b) chose  the best parallel programming paradigm for the program. In this post, I try to layout a taxonomy of dependencies in loops and how it plays into parallel programming.

Continue reading “Parallel Programming: On Using Dependence Information” »

Jun 212011

It has been a week since the AMD Fusion developer forum and I have been reading about what was said and told by AMD, ARM, and Microsoft speakers. While there were a lot of talks, the one that jumps out at me most is from AMD Fellow Phil Rogers. The following is my top three inferences from this talk.

Continue reading “Inferences from AMD Fusion Developer Forum” »

Jun 202011

The list of Top 500 fastest computers in the world just came out and the Japanese K-computer is the fastest and the most energy-efficient computer at the same time. It is hard to build computers that are both fast and energy-efficient so I set out to understand what Fujitsu has done right. This quick post is a summary of my investigation. For the very impatient, my crude experience-based analysis says that the special purpose instructions and highly specialized functional units in the core give them their edge.

Continue reading “Why the K-computer is the fastest and energy-efficient?” »

Jun 192011

I wrote a list of ten items all software programmers must know about hardware. Today, I want to provide a small quiz for you to evaluate yourself. Some questions are very simple but they exist to test the fundamentals. Enjoy!

The answers to the self assessment are available here.

Continue reading “Computer Science Self-assessment Quiz” »