Mar 042014

This post originally appeared on the Flux7 blog at: It discusses the relationship between throughput and latency summarized by Little’s Law. Little’s Law has applications on any system with throughput and latency, whether it is memory performance or an industrial assembly line. Read the original article below:

Continue reading “Understanding Throughput and Latency Using Little’s Law” »

Jul 242011

In this post, I describe pipeline parallelism,  a powerful method to extract parallelism from loops which are difficult to parallelize otherwise. Pipeline parallelism is underrated because of a misunderstanding that it only applies to “streaming” workloads. This statement is very vague and misleading. I discuss the uses, trade-offs, and performance characteristics of pipelined workloads.

Continue reading “Parallel Programming: Do you know Pipeline Parallelism?” »

Jul 212011

From the comments on my post which criticized Amdahl’s law, I got some very clear indication that awareness needs to be raised about the different types of serial bottlenecks that exist in modern parallel programs. I want to highlight their differences and explain why different bottlenecks ought to be treated differently from both a theoretical and practical standpoint.

Continue reading “Parallel Programming: Types of serial bottlenecks” »

Jul 082011

Many academic professors talk about parallelism in the context of HDL languages.  Having learned both Verilog and Pthreads, I have always felt that we can use some of the lessons learned in hardware (which is inherently parallel) in Parallel programming. ParC is based on this insight and an impressive piece of work. I learned about ParC through Kevin Cameron’s comments on Future Chips. After some (healthy) debate with Kevin, I felt that ParC is a concept we should all be familiar with. I am hoping that Kevin’s insights will trigger some interesting debate.  

When people say parallel programming is hard they are correct, but to say it is a new problem would be wrong. Hardware designers have been turning out parallel implementations of algorithms for decades. Back in the 1980s designers moved up from designing in gates to using RTL descriptions of circuits with synthesis tools for the hardware description languages (HDLs) Verilog and VHDL. In recent years other methodologies like assertion-driven verification and formal methods have been added to help get chips working at first Silicon.

Continue reading “Guest Post: ParC – Absorbing HDLs into C++” »

Jul 042011

Writing parallel code is all about finding parallelism in an algorithm. What limits parallelism are the dependencies among different code portions. Understanding the dependencies in a program early on can help the programmers (a) determine the amount of available parallelism, and (b) chose  the best parallel programming paradigm for the program. In this post, I try to layout a taxonomy of dependencies in loops and how it plays into parallel programming.

Continue reading “Parallel Programming: On Using Dependence Information” »

Jun 262011

After reading my post on the shortcomings of Amdahl’s law, A reader of Future Chips blog, Bjoern Knafla (@bjoernknafla), on Twitter suggested that I should add a discussion on Gustafson’s law on my blog. Fortunately, I have the honor of meeting Dr. John Gustafson in person when he came to Austin in 2009. The following are my mental notes from my discussion with him.  It is a very simple concept which should be understood by all parallel programmers and computer architects designing multicore machines.

Continue reading “Parallel Programming: Amdahl’s Law or Gustafson’s Law” »

Jun 262011

I received a interesting comment from ParallelAxiom, a parallel programming expert, on my post titled “When Amdahl’s law is inapplicable?” His comment made me re-think my post. I must show an example to hammer my point. Thus, I have added an example to the original post and I am adding this new post just so the RSS subscribers can see this update as well. Please look at the original article first in case you have not read it.

Continue reading “Parallel Programming: An example of “When Amdahl’s law is inapplicable?”” »

Jun 252011

I see a lot of industry and academic folks use the term Amdahl’s law without understanding what it really means. Today I will discuss what Gene Amdahl said in 1967, what has become of it, and how it is often misused.

Update: 6/25/2011: On a suggestion from Bjoern Knafla, I have added a closely related article on Gustafson’s law.

Continue reading “Parallel Programming: When Amdahl’s law is inapplicable?” »

Jun 212011

It has been a week since the AMD Fusion developer forum and I have been reading about what was said and told by AMD, ARM, and Microsoft speakers. While there were a lot of talks, the one that jumps out at me most is from AMD Fellow Phil Rogers. The following is my top three inferences from this talk.

Continue reading “Inferences from AMD Fusion Developer Forum” »

Jun 202011

The list of Top 500 fastest computers in the world just came out and the Japanese K-computer is the fastest and the most energy-efficient computer at the same time. It is hard to build computers that are both fast and energy-efficient so I set out to understand what Fujitsu has done right. This quick post is a summary of my investigation. For the very impatient, my crude experience-based analysis says that the special purpose instructions and highly specialized functional units in the core give them their edge.

Continue reading “Why the K-computer is the fastest and energy-efficient?” »