Multi-cores are here, and they are here to stay. Industry trends show that each individual core is likely to become smaller and slower (see my post to understand the reason). Improving performance of a single program with multi-core requires that the program be split into threads that can run on multiple cores concurrently. In effect, this pushes the problem of finding parallelism in the code to the programmers. I have noticed that many hardware designers do not understand the MT challenges (since they have never written MT apps). This post is to show them the tip of this massive iceberg.
Update 5/26/2011: I have also written a case study for parallel programming which may interest you.