May 302011
 

Two readers (HWM and Amateur Professional) have rightfully called me out that my recent article–“Ten things every programmer must know ”–provides only a list of items to study without successfully conveying the importance of learning those items. These readers make a very valid point indeed and the purpose of this post is to provide the motivation. The short answer is that the importance of efficient code has increased in recent years and it is hard to write efficient code without knowing hardware.

Update 6/19/2011: I have written a computer science self-assessment for developers to test their knowledge about computer science.

Why do we need efficient code?

It was idealistic to expect common programmers to understand basic hardware five years ago because there was less (or no) motivation–most programmers were coding for desktops/laptops with large, high-frequency cores that became faster every year.  In recent years, however, changes in CPU design and emergence of new platforms have changed mainstream programming. Today’s coders write code for multi-cores, smart phones, and GPUs. Multi-core CPUs require parallel programming, which is more sensitive to hardware configuration than single thread code. iPhone, Android, and iPads house leaner cores and limited battery lives which deem efficient coding a must. GPUs are even more sensitive to code quality and often-times require the entire algorithm to be hardware-centric. In my opinion, the emergence of these contemporary platforms provides more than enough motivation for programmers to learn basic hardware.

It makes sense financially…

My hypothesis is this–knowing what lies “under the hood” not only helps programmers write better code, it also allows them to understand and debug their code’s behavior faster. Don’t get me wrong, I am not asserting that every programmer should be an expert in hardware. I just believe everyone must have some basic understanding of the car they are driving in order to avoid incident. I also argue that the amount of information they need is small and can be picked up easily. See my list here if you have not yet seen it. It is my position that learning these concepts will net a positive impact on coders’ productivity and the resulting codes’ efficiency. In turn, this will lead to better user experiences and longer battery lives in the client-size platforms. I also must add that more efficient programming of the back-end servers directly translates into dollars these days–the most important metric :-) . A simple example: if I speedup my code by 10%, the revenue I pay to Amazon EC2 for computes reduces by 10%.

Is hardware-aware coding better?

Yes, and I will back up my claim with three simple examples.

1. The importance of blocking a matrix-matrix-multiply kernel is very well understood. I personally find it difficult to motivate blocking without explaining locality, caches, cache replacement, and memory bandwidth. If we explain blocking without the motivation, the programmers are unable to see the entire picture.

When asked to speedup the following kernel, a candidate I was interviewing replied “blocking.”

for(int i = 0; i < N; i++)
A[i] = B[i];

For reference: This kernel cannot benefit from blocking because blocking is about re-use in the cache and there is no re-use in this kernel.

I think the root cause is that we are teaching programming concepts like blocking without teaching the motivation behind them.

2. It is well-known that striding through a 2D array in the wrong order can kill spacial locality in the cache, stress the TLBs, and throw off hardware prefetching, thus rendering a program up to 10x slower–not to mention the extra energy consumed in the memory. A programmer who doesn’t understand hardware is more likely to make this mistake and will never even know what they have done wrong. The expert readers may think that everyone understands this concept but that is not true, e.g., the benchmark art in SPEC CPU 2000 suite accesses a two dimensional array in the wrong order in its inner loop.  Similarly, a programmer who does not understand hardware may not see any difference between arrays of structs and structs of arrays, which has been shown to be a major performance factor in GPU programming (GPU Computing Gems Emerald Edition (Applications of GPU Computing Series). One can argue that its the compilers job to fix the above; however, compilers have their own handicaps when the uninformed programmer “tricks” them with misleading information.

3. Branch prediction, predication, and the motivation behind conditional moves can only be appreciated if one understands pipelined processors and branch predictors. A general rule I often hear is that branches in the inner loop should be avoided. This is an incomplete statement on its own. Many branches are easy-to-predict in hardware and leaving such branches in the code is often better and more practical than adding the  redundant computation to remove those branches (see this paper). It is difficult for the programmers to identify easy-to-predict branches without understanding branch prediction.

-By the way, virtual functions and switch/case statements fall in this category as well. Programmers use them both without realizing that they are often translated into indirect branch instructions. These indirect jumps are much harder to predict than direct jumps and most lean cores do not employ the expensive indirect jump predictors capable of handling indirect jumps effectively.

The above examples were all first-order single-thread examples. My example on parallel programming demonstrates how coherence and cache line sizes also become first-order effects when writing code for multi-core.

Conclusion

I want to reiterate my point that programmers must learn basic hardware concepts. These simple concepts, most of which programmers can learn within hours, can potentially improve the overall quality of the software being written. I once again call out professors to ensure that software programmers are at least taught basic hardware and vice-versa.

Comments?

  10 Responses to “Quick Post: Why do programmers need to understand hardware?”

  1. thank you, very good article!
    i am also looking forward to scott meyer’s book “fastware” – http://www.aristeia.com/Fastware/ – there is hope it will get a standard book and be used for teaching.

  2. I am interviewing a lot of Associate C/C++ programmers lately and I am really shocked with their lack of understanding of even the mot basic things like memory heap vs stack or concepts of memory alignment etc.

    I hear your call but I doubt the CS majors are being taught the basics about compilers/HW. It seems that all the abstraction in modern programming languages is responsible for this.

    As a side note, until a couple of years ago I too did not see any major benefit of knowing exactly what’s going on under the hood. But I have been programming GUI applications back then, so I may be excused :)

    I am now dreaming of an integrated way to write and develop distributed, multi threaded-, multi process application, as such my interest has shifted 180 degrees.

    Love your blog. Please keep up the good work.

    • Hey,

      Welcome back and thanks for your kind comments. I really appreciate your encouragement.

      I agree with you that CS students are not being taught important stuff. Not sure if you read my note to CS/EE professors and the ten items I think every programmer should know. Please let me know if you want to add/remove stuff from the list. I am thinking about sending that list to the CS professors I know at the major schools to see their reaction.

      Aater

      • An issue with your request is that what you are asking to be taught is not especially easy and there is a good chance that many profs don’t already know the concepts you are asking for themselves. If people “just” want to be web developers there are very real limits to how much you can teach them before they throw in the towel.

        As always it’s a case of knowing more in almost always more helpful but what are you going to throw off an existing course to make room for the pieces you want to add?

        • “An issue with your request is that what you are asking to be taught is not especially easy and there is a good chance that many profs don’t already know the concepts you are asking for themselves”

          I think I disagree. Inspired by your post, I looked up a “random” sample of schools. Almost every CS or EE department has some people who understand hardware at this level. However, such courses are not mandatory in most cases. I am just trying to change the latter.

          I agree that you can’t teach people who are unwilling to learn because they do not see the motivation. Not much we can do about that. I still feel that even web developers need some of this information if they want to be good, e.g., my friends working at google know more about hardware than most hardware guys.

  3. [...] PowerVR GPUs are popular among handheld computer vendors today: official technology overview. Future AMD Fusion architecture overview. FutureChips blog, programmers may find some useful optimization hints there (e.g. starting here). [...]

  4. Hi there.
    Nicely written article, but I have a very basic question for you.
    Why at all do we need softwares when we can program everything on hardware? Think and answer.

  5. [...] all of my code is doing what I want. I never have to think about it and it will always be perfect. This guy says it better than I can. I wouldn’t trust a psychologist who couldn’t locate the brain in a [...]

  6. ) and Diamond Accent – $99 (down from $300) Takeout Girls Hooded Zip Up Sweater Sale Price $12.
    There are tons of coupon and mommy blogs out there that can help out,
    or you can join a bigger site like A Full Cup , or Hot Coupon
    World. New York & Company has released their 2012 Black Friday
    ad and it may not be one of the longest, but it has some
    of the biggest announcements.

 Leave a Reply

(required)

(required)

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>