This post is inspired by an old discussion I had with Jim Laurus at Microsoft Research before ASPLOS 2009. Jim believes that computer architects are building the wrong computers. He thinks that we add features to processors that no programmer cares about, making it necessary for the programmers to rely solely on hardware for performance. His poster child example was that programmers want sequential consistency; but unfortunately, all processors are going farther and farther away from it. While I completely disagreed with him at that moment, I have developed more understanding of his position over the years. In this article, I will go against my own kind and argue that we, computer architects, are indeed using the wrong metrics to design future microprocessor chips.
In my post on factors that influence computer design, I showed the triangle of performance, power, and programmer effort. These factors, while broadly understood, are often not applied in their essence. Most architects design CPUs holding programmer effort constant: we use existing benchmarks with no changes. Thus, if the code was hammered to fit in the shoe horn we designed the last time, our next design can only be a better shoe horn –we will never invent the “wheel” per say. Don’t agree with me? take a look at any of our top conferences like ISCA, Micro, etc. All the papers written there take pride in the fact that they used SPEC or PARSEC workloads untainted. Yes, it makes it for controlled scientific experiments, necessary for getting social conformance, but their studies miss an important variable vis-a-vis programmer effort. SPEC workloads were written for high-performance out-of-order cores as that was the name of the game back when they were written. The software pays no attention to instruction scheduling or optimizations as it was written in an era when power was less relevant and programmers relied on compiler/hardware to extract any performance. Now if I used SPEC to design my computer, I am likely to end up with a heavy weight out-of-order core again because the workloads were written to be sub-optimal. Now if the same workloads were written to run on DSPs, their algorithms would have been very different, e.g., the H.264 can be optimized 100x easily for simple in-order cores.
My message: we as a community need to start appreciating where we are wrong and fix it. I understand that its difficult to measure programmer effort quantitively which makes us architects stay away from it. I ask: Is it necessary to quantify everything into a bar chart? Can we not argue about certain things qualitatively. Yes, its “non-scientific” but perhaps that can lead to some scientific metrics down the road. It may sound fluffy but never forget that the industry does actually use this fluffy metric to make real decisions because they cannot ignore this factor. I will conclude with an appeal that architecture papers must at least discuss how their new architecture idea will impact programmer effort and how it will be impacted if the workloads were in fact changed.
I will close with a message from Doug Carmean’s talk at UT Austin on 3/23/2011, “Computer architecture is the same as Hardware/Software co-design.”