May 202011

Multi-cores are here, and they are here to stay. Industry trends show that each individual core is likely to become smaller and slower (see my post to understand the reason). Improving performance of a single program with multi-core requires that the program be split into threads that can run on multiple cores concurrently. In effect, this pushes the problem of finding parallelism in the code to the programmers. I have noticed that many hardware designers do not understand the MT challenges (since they have never written MT apps). This post is to show them the tip of this massive iceberg.
Update 5/26/2011: I have also written a case study for parallel programming which may interest you.

Continue reading “What makes parallel programming hard?” »

May 192011

I guess I will continue my rant that many programmers do not understand hardware. I stress that they should. Dear programmers, before you feel offended, please read my previous posts bashing hardware guys for not learning software:-)

I have made my list of top ten items every programmer should know. I do not explain the concepts but provide keywords you can google or find in your favorite book. I will write tutorials here in the future if you guys want.

Update 6/19/2011: I have written a computer science self-assessment for developers to test their knowledge about computer science.

Update 5/31/2011: In response to a couple of comments on this post, I have written a more detailed motivation for this article. It tries to answer why programmers today have a higher motivation to understand hardware.

Continue reading “Ten things every programmer must know about hardware” »

May 192011

One of my recent pet peeves is that our universities are producing graduates that know very little about how computers work. My apprehension was recently affirmed when we interviewed a few people for an internship. They had high a GPA, were from top 10 EE/CS schools, and clearly knew how to write functional code. What they didn’t understand was what happens when they compile or run the program. This could be the root cause of sub-optimal in-efficient code that we see everyday. Needless to say that this lack of basic understanding also stifles innovation. I share two stories to show the gravity of the problem.

Continue reading “CS/EE Professors, we want graduates who understand computers!” »

May 192011

I have been asked this question in-person and online before. I have seen “experts” argue about this at academic conferences and people debate this on forums. The answer is not a yes/no. It requires some analysis. My short answer is: mutli-core does not save energy unless you simplify each individual core to make it more energy-efficient. I explain this with an example and provide insights to back my assumptions.

Continue reading “Q & A: Do multicores save energy? Not really.” »

May 182011

Edit: Moin pointed out a small problem so this post was edited on 5/19

A branch instruction can be a big overhead in pipelined processors. At every branch, the processor has to decide which instruction to fetch after the branch. Instead of waiting for the branch, it tires to predict the branch direction. A correct prediction is a win; but a misprediction requires flushing the whole pipeline wasting all instructions fetched after the branch. The deeper the pipeline, the more instructions need to be flushed. Pipelines are deep in today’s processors hence writing code with branch penalty in mind is a plus for performance as well as power.

Branch prediction is an important concept which all programmers must understand for two reasons: (1) branches are very frequent and can be a major performance overhead if they mispredict in the branch predictor, (2) unlike cache behavior, branch behavior can be impacted from any high level language. My post presents some background on the importance of branches and present two very basic techniques to reduce their overhead.

Continue reading “Basic techniques to help branch prediction” »

May 172011

Yale Patt, a well-known computer architect likes to call multi-core multi-nonsense. He believes that multi-core was the easy way out chosen by Intel/AMD architects to harness the increasing transistors. In his words, “multi-core is a solution looking for a problem.” Others argue that multi-core is in fact a large opportunity for us architects to innovate. I agree more with Yale (with reservations) that it started out as multi-nonsense.

Continue reading “Multicore: Multi-nonsense or Multi-opportunity?” »

May 172011

I write this post because I am often amazed at the naiveness of some engineers who believe that the impact of finances on technical decisions stifles innovation. No! money is what pays for the whole party and must be the most relevant factor. In my opinion, using the dollar metric adds a very important dimension to engineering, making it more interesting and productive.

While technical metrics like performance, power, and programmer effort make up for nice fuzzy debates, it is pivotal for every computer guy to understand that “Dollar” is the one metric that rules them all. The other metrics are just sub-metrics derived from the dollar: Performance matters because that’s what customers pay for; power matters because it allows OEMs to put cheaper, smaller batteries and reduce people’s electricity bills; and programmer effort matters because it reduces the cost of making software. The best analogy that comes to my mind is that “make money” is our constitution and the other metrics/rules are mere laws that evolve over time.

Continue reading “Dollar — the only metric that matters in computer engineering” »

May 172011

Benchmark selection for architecture studies is always tricky. The ideal benchmark is the actual workload that will run on the computer being designed. However, workloads to run are often written after the computer has been designed which architects guessing for the right workloads.  Common benchmarks today include C/C++/Java/Fortran workloads (e.g.,  TPCC, Djemm, SPEC CPU) but very few architects investigate JavaScript workloads. I assert the studying Javascript is essential because Javascript is Common, Unique, and increasingly powerful. I will discuss these three properties of JavaScript and discuss how studying JavaScript can impact chip design.
Continue reading “Why computer architects MUST benchmark Javascript?” »

May 172011

GPGPU is a new trend and it has triggered many new questions in the minds of young computer scientists. In this post, I cover some of the FAQs on this topic.

What types of problems are better suited to regular multicore and what types are better suited to GPGPU?

GPUs contain some fixed function and programmable hardware. While GPUs are trending towards more and more programmable units, todays GPUs perform some common graphics tasks like texture sampling, and rendering using special purpose hardware. In contrast, pixel shading is done using programmable SIMD cores. GPGPU workloads mostly run on the SIMD shader cores.

Continue reading “CPU vs. GPGPU” »

May 162011

This post is inspired by an old discussion I had with Jim Laurus at Microsoft Research before ASPLOS 2009. Jim believes that computer architects are building the wrong computers. He thinks that we add features to processors that no programmer cares about, making it necessary for the programmers to rely solely on hardware for performance. His poster child example was that programmers want sequential consistency; but unfortunately, all processors are going farther and farther away from it. While I completely disagreed with him at that moment, I have developed more understanding of his position over the years. In this article, I will go against my own kind and argue that we, computer architects, are indeed using the wrong metrics to design future microprocessor chips.

Continue reading “Are computer architects building the wrong computer hardware?” »