May 162011

This post is inspired by an old discussion I had with Jim Laurus at Microsoft Research before ASPLOS 2009. Jim believes that computer architects are building the wrong computers. He thinks that we add features to processors that no programmer cares about, making it necessary for the programmers to rely solely on hardware for performance. His poster child example was that programmers want sequential consistency; but unfortunately, all processors are going farther and farther away from it. While I completely disagreed with him at that moment, I have developed more understanding of his position over the years. In this article, I will go against my own kind and argue that we, computer architects, are indeed using the wrong metrics to design future microprocessor chips.

In my post on factors that influence computer design, I showed the triangle of performance, power, and programmer effort. These factors, while broadly understood, are often not applied in their essence.  Most architects design CPUs holding programmer effort constant: we use existing benchmarks with no changes. Thus, if the code was hammered to fit in the shoe horn we designed the last time, our next design can only be a better shoe horn –we will never invent the “wheel” per say. Don’t agree with me? take a look at any of our top conferences like ISCA, Micro, etc. All the papers written there take pride in the fact that they used SPEC or PARSEC workloads untainted. Yes, it makes it for controlled scientific experiments, necessary for getting social conformance, but their studies miss an important variable vis-a-vis programmer effort. SPEC workloads were written for high-performance out-of-order cores as that was the name of the game back when they were written. The software pays no attention to instruction scheduling or optimizations as it was written in an era when power was less relevant and programmers relied on compiler/hardware to extract any performance. Now if I used SPEC to design my computer, I am likely to end up with a heavy weight out-of-order core again because the workloads were written to be sub-optimal. Now if the same workloads were written to run on DSPs, their algorithms would have been very different, e.g., the H.264 can be optimized 100x easily for simple in-order cores.

My message: we as a community need to start appreciating where we are wrong and fix it. I understand that its difficult to measure programmer effort quantitively which makes us architects stay away from it. I ask: Is it necessary to quantify everything into a bar chart? Can we not argue about certain things qualitatively. Yes, its “non-scientific” but perhaps that can lead to some scientific metrics down the road. It may sound fluffy but never forget that the industry does actually use this fluffy metric to make real decisions because they cannot ignore this factor. I will conclude with an appeal that architecture papers must at least discuss how their new architecture idea will impact programmer effort and how it will be impacted if the workloads were in fact changed.

I will close with a message from Doug Carmean’s talk at UT Austin on 3/23/2011, “Computer architecture is the same as Hardware/Software co-design.”


  20 Responses to “Are computer architects building the wrong computer hardware?”

  1. The most important factor to serial processing is of course, processor IPC (instructions per clock) versus clock speed. The main problem is that for quite some time, IPC did not advance and appeared to have reached a plateau in advancement until very recently. Clock speed advancements also hit this virtual wall of process development and heat versus current. The obvious focus changed on core parallelism as a “band aid” to counter this stalemate, in much the same way Hyperthreading was first used as a band aid to offset the long pipeline of Pentium 4 and Netburst processors

    Many tasks can be accelerated in parallel function, but many cannot. And as this has become more and more clear over the last few years, more breakthroughs in IPC and clock speed have come through, pushing serial performance up another solid notch. It’s a balancing act, unfortunately. Intel’s recent advance in Sandy Bridge is a GREAT example of increasing IPC per core by rearranging the decode stages for more efficiency and caching only what has to be cached, not keeping multiple working copies of the same data in the pipeline. This allowed a better internal reallocation of processing resources to boost single threaded performance by as little as no change, up to as much as the noted 30% commonly quoted in the media as SnB’s IPC advantage over the previous Nehalem based architecture.

    Single thread performance has begin marching forward again, albeit slower and more cautiously then during the march to Netburst, and rightly so. Because of the large, multi-year push for parallelism in software development, there have been a significant number of advances in multithreaded software, meaning that investment is still very valid. But Intel, and to a lesser extent AMD as well, are shifting focus back towards ensuring serial processing isn’t in a state of stagnation as it had fallen into for some time.

    I, therefore, believe that it may not be needed to push for faster serial performance when that goal is already becoming the balanced focus it needs to be. AMD’s Bulldozer/Llano concept of SMT/SMP hybrid approach is meant to significantly balance these parallel and serial loads out with architecture directly addressing this balance. Intel’s excellent progress in this area with Sandy Bridge gives me excellent hope that they’re on the ball with this thinking as well.

    • Interesting thought.I would say that Intel and AMD push for serial thread performance for two reasons.

      1. They know how to
      2. Most apps remain single threaded

      So your point is well-taken that we no longer need to “push” single thread performance since its already on everyone’s radar. SMT/SMP is clearly a win and it does provide this serial/parallel balance.

      I would add one side point to that … parallel processing is also going in the direction of SIMD (the whole GPGPU thing). Thus, chips should most certainly put more SIMD than before because that can then cater the embarrassingly parallel workloads. The GPUs on die are interesting but not enough because the latency of sending work to the GPU makes it so that you only ship tasks which are large enough to amortize the overhead of sending them. My next rave, closer SIMD units! (you will see a post soon:)

      Side note: I just published an article title Multi-core or multi-nonsense. Funny how you call multi-core a band-aid. You will find the article interesting.

  2. Terrific work! This is the type of information that should be shared around the web. Shame on the search engines for not positioning this post higher!

    • Hi Mike,

      I really feel encouraged after reading your comment. Thanks.

      As for the search engines, I agree but I am hoping it will improve with time as we get more visitors and they link to our site from their sites. We do need a lot more collaboration between different parts of CS/EE to improve both sides and remove the inefficiencies. Perhaps the word is best spread peer to peer. I will continue to write so please keep your feedback coming.


  3. Thanks for use full information

  4. Excellent beat ! I would like to apprentice at the same time as you amend your
    web site, how can i subscribe for a blog site?
    The account helped me a applicable deal. I were tiny bit acquainted of
    this your broadcast offered brilliant transparent

  5. It will also rally the customer to have a better memory about the product.
    Batman, Robin, Supergirl and Wonder Woman a few examples.
    Shakespeare refers to life as a stage again in Macbeth:.

    Feel free to visit my blog post; 2012 graduation cakes;,

  6. If you can try on any costumes, it is easier to find the perfect fit.

    When you’re a kid, Halloween is about being able to dress as your favorite character, getting your pillow case, and finding
    as much junk food as you can in a three hour time frame.
    You need to purchase the iron man 2 costumes fast
    since they are going to be a big hit this Halloween

    Check out my site diamond rings diamondsonweb

  7. Disney fancy dress is a colorful, well made, and your children will have play value from them.
    They spend their whole time looking at everything and have a difficult time choosing from
    the hundreds, if not thousands, of different options.

    Some of the more outrageous couples’ costumes
    I found were:.

    Feel free to surf to my web-site baby room decor pictures (Marissa)

  8. But they will also find pirate and superhero outfits
    filling the racks. Today, costume shops stock many classes of Viking
    costumes that can please anyone. This costume is suitable for
    conservative partygoer as it is available in different lengths and cuts.

    Review my web blog life quotes and sayings

  9. It is a business venture that is a niche whereby the entrepreneur and operator will not be
    able to selectively target his or her potential customer grouping.

    Costuming became popular for Halloween parties in the US in the early 20th
    century, as often for adults as for children. * Fairy tale – Be
    creative and become one of your favorite fairy tale characters by wearing dresses
    like Little Red Riding Hood, Snow White, Cinderella and many more.

    My weblog … youtube retriever

  10. These items may be scary, sexy, or sexy and scary at the same time.
    There are so many different materials and articles of clothing that work together
    when belly dancing, but wouldn’t look great
    in the real world. The profit you make will come in handy when the
    December holidays come around.

    My site :: family room wall decor (Gavin)

  11. When you are shopping for a computer desk you should always
    think more about the chair you will be sitting in and less about
    the desk. At this time tin ceilings were considered modern
    and an upgrade from the heavy plaster ceilings that dominated the earlier decades.
    If you put into action these basic indoor and exterior ornamentation suggestions then your residence will become much more alluring
    to the eyes and much more low-priced to your purse.

    My web blog: party table decoration ideas

  12. The adaptability in the chair and its cost effective makes it a
    great choice for small spaces such as dorm rooms. Go for adorning image frames
    with delightful shades. Otherwise, place deeper pieces on
    ends, and take advantage of empty corners.

    Here is my web blog … interior design schools

  13. Food images can beautifully deck up the walls of the dining zone.

    We would need to pick the perfect furniture that has the match pattern to
    the room. Also never put down the pots in routine average style.

    my webpage: cool dorm room decorations

  14. These two paintings are held together by hand points,
    which is done on a cushion tailor to form the chest. In what historians presumed as the very first striptease performance, which was shown
    during the 1896 World’s Fair, was the portrait “A French Woman Goes to Bed. With a few accessories you can turn something ordinary into something extraordinary.

    Feel free to visit my blog post shiba inu

  15. However, this fun task can be overwhelming with options to
    choose from, here are the main elements you ought to consider when decorating young kids rooms.
    Scarves made of organza material adds special appeal to lampshades hence make the ambience of the room soothing and comfortable.
    As kids are very delicate, the paint shades should be used in such
    a way to instill a feeling of fun and energy into them.

    Review my homepage :: buckingham palace address

  16. Spot on with this write-up, I honestly believe this web site
    needs far more attention. I’ll probably be returning to see
    more, thanks for the info!

    my web site Teaching kids

 Leave a Reply



You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>