Blue Gene

The first computer in the Blue Gene series, Blue Gene/L, developed through a partnership with Lawrence Livermore National Laboratory, cost US$100 million and is intended to scale to speeds in the hundreds of TFLOPS, with a theoretical peak performance of 360 TFLOPS. This is almost ten times as fast as the Earth Simulator, the fastest supercomputer in the world before Blue Gene. In June 2004, two Blue Gene/L prototypes scored in the TOP500 Supercomputer List at the #4 and #8 positions.

On September 29, 2004, IBM announced that a Blue Gene/L prototype at IBM Rochester (Minnesota) had overtaken NEC's Earth Simulator as the fastest computer in the world, with a speed of 36.01 TFLOPS, beating Earth Simulator's 35.86 TFLOPS. The machine later reached a speed of 70.72 TFLOPS.

Linux will be the main operating system for IBM's upcoming family of "Blue Gene" supercomputers--a major endorsement for the operating system and the open-source computing model it represents. The decision to adopt Linux came, in part, as a result of the growing size and strength of the open-source community. Thousands of developers around the world are participating in the evolution of Linux. Creating a new OS inside of IBM would require a massive engineering effort.

Major features

The Blue Gene/L supercomputer was unique in the following aspects:

Trading the speed of processors for lower power consumption. Blue Gene/L used low frequency and low power embedded PowerPC cores with floating point accelerators. While the performance of each chip was relatively low, the system could achieve better performance to energy ratio, for applications that could use larger numbers of nodes.

Dual processors per node with two working modes: co-processor mode where one processor handles computation and the other handles communication; and virtual-node mode, where both processors are available to run user code, but the processors share both the computation and the communication load.

System-on-a-chip design. All node components were embedded on one chip, with the exception of 512 MB external DRAM.

A large number of nodes (scalable in increments of 1024 up to at least 65,536) Three-dimensional torus interconnect with auxiliary networks for global communications (broadcast and reductions), I/O, and management Lightweight OS per node for minimum system overhead (system noise).