It’s a commonplace of today’s technology that we are able to obtain more and more capability in smaller and smaller packages. I can carry in my shirt pocket more storage, in a USB drive about the size of a pack of chewing gum, more storage than the first System/360 computer I worked on ever had. A less obvious result of the same advances in technology is that we can build much bigger systems than would have been imaginable even a couple of decades ago. (We use some of these systems — Google and Facebook come to mind — all the time, but most of us never see them physically, or have occasion to think about the totality of the system.)
Now, according to an article at Technology Review, IBM Research in Almaden, CA, is in the process of building a disk drive array at least ten times bigger than any in existence. The new, as yet unnamed, system, which is being built for an unnamed client, will use 200,000 individual drives to achieve a capacity of 120 petabytes, or 1.2 × 1017 bytes. To put it another way, it’s 120 million gigabytes.
The giant data container is expected to store around one trillion files and should provide the space needed to allow more powerful simulations of complex systems, like those used to model weather and climate.
Other potential application areas are seismic processing in the oil/gas industry, and computational chemistry. Although the initial uses for this array will almost certainly be in supercomputing environments, the technology could be adapted for more conventional cloud computing systems.
The hardware for the system is water-cooled, to accommodate higher equipment density. With so many individual devices, dealing with component failures is a necessity. The system uses fairly standard data redundancy techniques (as in RAID systems) to allow data from a failed drive to be rebuilt in spare space, but its software is designed to minimize the effect on overall system throughput.
IBM uses the standard tactic of storing multiple copies of data on different disks, but it employs new refinements that allow a supercomputer to keep working at almost full speed even when a drive breaks down.
When a lone disk dies, the system pulls data from other drives and writes it to the disk’s replacement slowly, so the supercomputer can continue working. If more failures occur among nearby drives, the rebuilding process speeds up to avoid the possibility that yet another failure occurs and wipes out some data permanently.
The new system uses IBM”s Global Parallel File System [GPFS], a distributed,highly scalable file system, originally developed for high-performance computing to provide faster data access by, for example, “striping” parts of a file across multiple devices. IBM says that current GPFS implementations have I/O rates of ~100 GB/sec. GPFS also incorporates features to reduce the overhead associated with maintaining the file system.
Last month a team from IBM used GPFS to index 10 billion files in 43 minutes, effortlessly breaking the previous record of one billion files scanned in three hours.
(The IBM GPFS pages linked above have more information and links to documentation on GPFS.) IBM Research says it is working on a new generation of GPFS systems that will be even faster.
IBM Research – Almaden is working with IBM’s product divisions to extend GPFS to support a new 2011-2012 generation of supercomputers featuring up to 16,000 nodes and 500,000 processor cores. Such a system must be capable of achieving I/O rates of several terabytes per second to a single file, be capable of creating 30,000 to 40,000 files per second, and holding up to a trillion files (to create a trillion files, just create 30,000 files per second continuously for a year).
This is pretty amazing stuff.