It has been a week since the AMD Fusion developer forum and I have been reading about what was said and told by AMD, ARM, and Microsoft speakers. While there were a lot of talks, the one that jumps out at me most is from AMD Fellow Phil Rogers. The following is my top three inferences from this talk.
1. Accelerators are taking over the world. The word GPU, APU, and video were mentioned twice as many times as ‘CPU’ (yes, I tallied). To see what I mean, look at this slide of an AMD chip. Can you even spot the CPUs? My rough measurements say that CPUs constitute less than 15% of this chip.
2. According to AMD, we are past the multi-core era and entering the era of heterogeneous cores. Really? Isn’t it true that penetration of parallel programming is still low and we are having so much trouble keeping multi-cores busy? If heterogeneous is the next step then I am afraid we are getting ahead of ourselves here. I would go back and say that we are designing the wrong chips. We need to first design multi-cores in a way that they become programmable. Shifting it to programmers is not getting it done.
3. The gap between CPU and GPU is closing. GPUs are expected to have virtual memory, exception handling, recursion, context-switches, pre-emption, etc. As GPUs inherit more and more CPU features, who needs CPUs? It is noteworthy that a GPU with all of the above features can indeed operate as a CPU. When running single-threaded code, the chip can clock-gate N-1 vector lanes and keep just a single lane on. I guess we will still need aggressive cores to tackle Amdahl’s law but their need is diminishing.
In the future …
Phil’s talk is indeed very intriguing. The lesson I learned is that heterogeneous cores will be upon developers whether they like it or not. I do think these architectural advancements (or cop outs) must be accompanied with advancements in compiler technology to find parallelism. Remember, newer frameworks only solve a part of the problem. We will need more help from the compiler and run-time system in solving the biggest challenge of functional asymmetry: choosing the best core for each type of code. I suggest better profile-based compilation technique that can suggest to the programmer which core can run this code best.