January 24, 2011

Disruptive Technology


disrupt:
 to interrupt the normal course or unity of


With somewhat regular frequency, I get into discussions about whether GPUs are a "disruptive technology" in HPC (I hang out with nerds). Often this is in response to articles like this. As someone who now covers HPC for NVIDIA, but was working for LLNL through most of the last decade, I have some decidedly strong opinions on this topic.

The disruptive part is the upset in the top500 performance curve. If you plot the up-and-to-the-right performance of the top500 list, IBM BlueGene/L was above the curve as it existed at the time -- same outlay of cash/power/time, lots more performance. As the first massively parallel processing machine based on embedded processors, it was assumed that subsequent improvements in PPC (or other) embedded processors would provide a series of increasingly faster MPP systems. This has been mostly true, though the follow-on machines have not been large enough to capture the number 1 position.

Going back further in time, I don't think I ever heard the Earth Simulator referred to as disruptive technology. Though it captured the and held the "worlds fastest computer" title by a larger margin and for a longer time that BG/L would, most viewed it as the last gasp of vector technology. Distributed memory clusters like IBM SP had already won the mind-share at most sites, and though Earth Simulator was able to remain at #1 for over two years, it was pretty clear that it's dominance would come to an end. More vector machines were built, but HPC codes continued to move from vector to distributed memory.

Getting back to the question, with BlueGene/Q poised to re-take the top500 crown, are GPUs a temporary disturbance of embedded MPP march to dominance, or a permanent shift?

My answer is that disruption isn't a zero-sum game, and embedded MPP doesn't have to fail for GPUs to succeed. GPUs provide the "same slope, higher intercept" disruption of the top500 curve that embedded MPP did. Like embedded MPP, advances in the underlying technology will continue to provide GPU computing with performance improvements for years to come. And like embedded MPP, GPUs are leveraging a robust consumer market to achieve those advancements.

Where GPUs hold a significant advantage is barrier to entry. There are millions of CUDA-capable GPUs in systems today, and hundreds of universities teaching CUDA. Moreover, GPU clusters can be constructed entirely of common off-the-shelf hardware and software, putting the cost within reach of individual researchers and small teams. It's instructive to remember that Nebulae and Tianhe-1A were designed and built in months, while most of the embedded MPP designs have taken years to go from powerpoint to power-on.

There is room for both. In the upper range of the top500, embedded MPP will continue to leverage specialized networks and OS stacks to achieve performance through high node counts. At the same time, GPUs are already delivering systems with over 1TF of peak double-precision per node on everything from desktops to petaflop clusters. Both have been disruptive, and either could grow to dominate HPC in years to come. I've adjusted my career path accordingly for the next five years or so.

No comments: