China’s New Supercomputer is Tops

June 17, 2013

Today, at the opening session of the 2013 International Supercomputing Conference in Leipzig, Germany, the latest update (the 41st version) to the Top 500 list of supercomputers was announced, and a new Chinese system, the Tianhe-2, has taken first place honors.  The system achieved performance of 33.86 petaflops per second (3.386 × 1016 floating point operations per second) on the LINPACK benchmark; the Tianhe-2 (in English, Milky Way-2) will be deployed at the National Supercomputer Center in Guangzho, China, by the end of the year. The system has 16,000 nodes, each with multiple Intel processors, for a total of 3,120,000 processor cores.

The Titan system at Oak Ridge National Laboratory, ranked number 1 in the November, 2012 list, and the Sequoia system, at the Lawrence Livermore National Laboratory, previously ranked number 2, have both moved down one place, to number 2 and number 3, respectively.  The two system are still noteworthy as being among the most energy-efficient in use.  Titan delivers 2,143 Megaflops/Watt, and Sequoia 2,031.6 Megaflops/Watt.

The total capacity of the list has continued to grow quickly.

The last system on the newest list was listed at position 322 in the previous TOP500 just six months ago.  The total combined performance of all 500 systems has grown to 223 petaflop/sec, compared to 162 petaflop/sec six months ago and 123 petaflop/sec one year ago.

You can view the top ten systems, and the complete list, at this page.


A Supercomputer ARM Race?

May 28, 2013

The PC World site has a report of an interesting presentation made at the EDAworkshop13 in Dresden, Germany, this month, on possible future trends in the high-performance computing [HPC] market.  The work, by a team of researchers from the Barcelona Supercomputing Center in Spain, suggests that we may soon see a shift in HPC architecture, away from the commodity x86 chips common today, and toward the simpler processors (e.g., those from ARM) used in smart phones and other mobile devices.

Looking at historical trends and performance benchmarks, a team of researchers in Spain have concluded that smartphone chips could one day replace the more expensive and power-hungry x86 processors used in most of the world’s top supercomputers.

The presentation material is available here [PDF].  (Although PC World calls it “a paper”, it is a set of presentation slides.)

As the team points out, significant architectural shifts have occurred before in the HPC market.  Originally, most supercomputers employed special purpose vector processors, which could operate on multiple data items simultaneously.  (The machines built by Cray Research are prime examples of this approach.)  The first Top 500 list, published in June 1993, was dominated by vector architectures  — notice how many systems are from Cray, or from Thinking Machines, another vendor of similar systems.  These systems tended to be voracious consumers of electricity; many of them required special facilities, like cooling with chilled water.

Within a few years, though, the approach had begun to change.  A lively market had developed in personal UNIX workstations, using RISC processors, provided by vendors such as Sun Microsystems, IBM, and HP.   (In the early 1990s, our firm, and many others in the financial industry, used these machines extensively.)  The resulting availability of commodity CPUs made building HPC system using those processors economically attractive.  They were not quite as fast as the vector processors, but they were a lot cheaper.  Slightly later on, a similar transition, also motivated by economics, took place away from RISC processors and toward the x86 processors used in the by-then ubiquitous PC.

Top 500 Architectures

Top 500 Processor Architectures

The researchers point out that current mobile processors have some limitations for this new role:

  • The CPUs are mostly 32-bit designs, limiting the amount of usable memory
  • Most lack support for error-correcting memory
  • Most use non-standard I/O interfaces
  • Their thermal engineering does not necessarily accommodate continuous full-power operation

But, as they also point out, these are implementation decisions made for business reasons, not insurmountable technical problems.  They predict that newer designs will be offered that will remove these limitations.

This seems to me a reasonable prediction. Using more simple components in parallel has often been a sensible alternative to more powerful, complex systems.  Even back in the RISC workstation days, in the early 1990s, we were running large simulation problems at night, using our network of 100+ Sun workstations as a massively parallel computer.  The trend in the Top 500 lists is clear; we have even seen a small supercomputer built using Raspberry Pi computers and Legos.  Nature seems to favor this approach, too; our individual neurons are not particularly powerful, but we have a lot of them.


Weather Forecasts: Improving

May 25, 2013

Although there are a lot of different sources from which you can get a weather forecast, those forecasts all come from one of a few sources: national weather services that run large, numerical weather prediction models on their computer systems.  Two of the major suppliers are the US National Weather Service’s [NWS]  Global Forecasting System [GFS] (the source for most US forecasts), and the European Centre for Medium-Range Weather Forecasts [ECMWF], located in Reading, England.  Over the last few years, there has been a growing feeling that the US effort was not keeping up with the progress being made at ECMWF.  The criticism became considerably more pointed in the aftermath of last year’s Hurricane Sandy.  Initial forecasts from the GFS projected that the storm would head away from the US East Coast into the open Atlantic.  The ECMWF models correctly predicted that Sandy would make a left turn, and strike the coast in the New Jersey / New York region.

According to a story in Monday’s Washington Post, and a post on the paper’s “Capital Weather Gang” blog, at least one good thng will come out of this rather embarrassing forecasting error.  It’s anticipated that the NWS will get additional appropriated funds to allow the computers and the models they run to be updated.

Congress has approved large parts of NOAA’s spending plan under the Disaster Relief Appropriations Act of 2013 that will direct $23.7 million (or $25 million before sequestration), a “Sandy supplemental,” to the NWS for forecasting equipment and computer infrastructure.

This should go a long way toward addressing one of the most pressing needs for the GFS: more computing horsepower.

Computer power is vital to modern weather forecasting, most of which is done using mathematical models of the Earth’s climatic systems.  These models various weather features, such as winds, heat transfer, solar radiation, and relative humidity, using a system of partial differential equations.  (A fundamental set of these is called the primitive equations.)  The equations typically describe functions that are very far from linear; also, except for a few special cases, the equations do not have analytic solutions, but must be solved by numerical methods.

The standard techniques for numerical solution of equations of this type involves approximating the differential equations with difference equations on a grid of points.  This is somewhat analogous to approximating a curve by using a number of line segments; as we increase the number of segments and decrease their length, the approximation gets closer to the true value.  Similarly, in weather models, increasing the resolution of the grid (that is, decreasing the distance between points) allows better modeling of smaller-scale phenomena.  But increasing the resolution means that correspondingly more data must be processed and more sets of equations solved, all of which takes computer power.  Numerical weather prediction , although it had been worked on for some years, really only began to be practical in the 1950s, with the advent of digital computers, and the early weather models had to incorporate sizable simplifications to be at all practical.  (It is not too useful to have a forecasting model, no matter how accurate, that requires more than 24 hours to produce a forecast for tomorrow.)

The computation problem is made worse by the problems inherent in data acquisition.  For this type of numerical analysis, the three-dimensional grid would ideally consist of evenly spaced points, covering the surface of the Earth and extending upwards into the atmosphere.  Clearly, this ideal is unlikely to be achieved in practice; getting observations from the center of Antarctica, or the mid-Pacific Ocean, is not terribly convenient.  There are also ordinary measurement errors to deal with, of course.  This means that a good deal of data pre-processing and massaging is requied, in addition to running the model itself, adding even more to the computing resources needed.

Many observers point to the GFS’s limited computer power as one of the chief weaknesses in the US effort.  (For example, see this blog post by Cliff Mass, Professor of Atmospheric Sciences at the University of Washington, or this post by Richard Rood, Professor at the University of Minnesota in the Department of Atmospheric, Oceanic and Space Sciences.)   The processing speed of the current GFS system is rated at 213 teraflops (1 teraflop = 1 × 10¹² floating point operations per second); the current ECMWF system is rated at 754 teraflops (and is listed as number 38 in the most recent Top 500 supercomputer list, released in November 2012 — the GFS system does not make the top 100).

The projected improvements to the GFS system will raise its capacity to approximately 2600 teraflops; in terms of the most recent Top 500 list, that would put it between 8th and 9th places.  (Over the same period, the ECMWF system is projected to speed up to about 2200 teraflops.)   This will enable the resolution of the GFS to be increased.

The NWS projects the Sandy supplemental funds will help enhance the horizontal resolution of the GFS model by around a factor of 3 by FY2015, enough to rival the ECMWF.

There are also plans to make other improvements in the model’s physics, and in its associated data acquisition and processing systems.

These improvements are worth having.  The projected $25 million cost is a very small percentage of the total Federal budget (about $3.6 trillion for fiscal 2012).  As we are reminded all too often, extreme weather events can come with a very large price tag, especially when they are unexpected.  Better forecasts have the potential to save money and lives.


First Petaflop Computer to be Retired

March 31, 2013

I’ve posted notes here about the Top500 project, which publishes a semi-annual list of the world’s fastest computer systems, most recently following the last update to the list, in November 2012.

An article at Ars Technica reports that the IBM Roadrunner system, located at the US Department of Energy’s Los Alamos National Laboratory, will be decommissioned and, ultimately, dismantled.  The Roadrunner was the first system whose performance exceeded a petaflop (1 petaflop = 1 × 1015 floating point operations per  second).  It held the number one position on the Top 500 list from June, 2008 through June 2009; it was still ranked number two in November, 2009.  The Roadrunner system contained 122,400 processor cores in 296 racks, covering about 6,000 square feet.  It was one of the first supercomputer systems to use a hybrid processing architecture, employing both IBM PowerXCell 8i CPUs  and AMD Opteron dual-core processors

The system is being retired, not because it is too slow, but because its appetite for electricity is too big.   In the November 2012 Top 500 list, Roadrunner is ranked at number 22, delivering 1.042 petaflops and consuming 2,345 kilowatts of electricity.  The system ranked as number 21, a bit faster at 1.043 petaflops, required less than half the power, at 1,177 kilowatts.

It will be interesting to see how the list shapes up in June, the next regular update.


Supercomputing Reaches New Heights

December 22, 2012

I’ve written here before about the semi-annual Top 500 ranking of the world’s supercomputer installations, based on their performance on a computational benchmark.  The Phys.org site has a report of a new system that, while it does not qualify for inclusion in the Top 500 list, has a distinction of its own: it is located at an elevation of 5,000 meters  (16,400 feet) in the Andes in northern Chile, making it the highest system in existence.

The system is installed at the site of the Atacama Large Millimeter/submillimeter Array (ALMA) telescope, the most elaborate ground-based telescope in history.  ALMA, an international astronomy facility, is a partnership of Europe, North America and East Asia in cooperation with the Republic of Chile.  The giant telescope’s main array uses 50 dish antennas, each 12 meters (39.3 feet) in diameter, separated by as much as 16 kilometers (10 miles).  There is also a smaller array of four 12-meter and twelve 7-meter (23 feet) antennas.  ALMA functions as an interferometer, which means that the signals from all the antennas in use must be processed together in order to be useful.

The computing system, called the ALMA Correlator, contains 134 million processors, and can handle date from up to 64 antennas simultaneously.  In doing this, it performs approximately 17 quadrillion (1.7 × 1016) operations per second.  Because it is a specialized system, it is not directly comparable to the supercomputers in the Top 500 list (which are ranked on the basis of the LINPACK benchmark).  Nonetheless, the per-operation time is of the same order as that of the TITAN system, which is currently ranked number one of the Top 500, at 1.76 × 1016 floating point operations per second.  (The European Southern Observatory has published an announcement of the ALMA Correlator with more details.)

The radiation wavelengths (millimeter and sub-millimeter) that ALMA studies come from some of the coldest objects in the universe.  Because these wavelengths are significantly absorbed by water vapor, the observatory is located at one of the highest and driest places on earth, the high plateau at Chajnantor, in northern Chile.

Apart from the logistical difficulties involved in building an observatory in such a remote place, the high altitude and correspondingly thin atmosphere create other problems.  Because the air is so thin, the air flow needed to cool the system is approximately  twice that which would be needed at sea level.  Standard hard disk drives rely on “floating” the read/write heads above the platters on an air cushion; that doesn’t work at this altitude, so the system must be diskless.  Human performance is affected, too; a photo accompanying the article shows a technician working on the machine and wearing a supplemental oxygen supply.  (I have never worked at 16,000 feet, but I can say from personal experience that walking 50 yards at a 10,000 foot elevation is a noticeable effort.)  The site is also in a zone of regular seismic activity, so the system must be able to withstand earthquake vibrations.

The ALMA observatory is scheduled to be completed in late 2013, but it has already begun making some observations.  This is fascinating science; in effect, it gives us a “time machine” with which we can observe some of the earliest, and most distant, objects in the universe.


Official: Titan is Tops

November 12, 2012

This morning, the semi-annual Top500 list of the world’s fastest super-computers was released.  As expected, the newly-operational Titan system at Oak Ridge National Laboratory [ORNL] took the top spot, beating out the Sequoia system at Lawrence Livermore National Laboratory.  Titan achieved performance of 17.59 petaFLOPS (1.759 × 1016 floating point operations per second) on the LINPACK benchmark.  This measured speed is less than the potential maximum speed of the system, which is rated at 27 petaFLOPS.

It’s a bit puzzling that the Top500 announcement lists the Titan system as having 560,640 processor cores, a number considerably different from the 299,008 cited in the original press release from ORNL.  The Titan system also uses NVIDIA Tesla K20 graphics processors to boost floating-point performance.  It occupies 4,352 square feet of floor space, and draws 8.2 megawatts of electricity.  (Don’t look for a battery-powered mobile version anytime soon.)

As has been true for quite some time, the operating system usage on these machines is rather different from that in the desktop market.

OS Family Percent of Systems
Linux 93.8
Unix 4.0
Mixed 1.4
Windows 0.6
BSD Unix 0.2

One thing that is striking about these figures is that Linux has almost entirely replaced Unix in this market; in November 2000, 85.4 percent of the Top500 systems ran Unix, while 10.8 percent ran Linux.  Also, obviously, this is one OS market segment where Microsoft has almost no presence.

You can see the complete list here.


Oak Ridge Powers Up Titan

October 29, 2012

The US Department of Energy’s Oak Ridge National Laboratory [ORNL] announced the start-up of a new supercomputer, called Titan, today.   The new machine, which is likely to displace the Sequoia computer at Lawrence Livermore National Laboratory as the world’s fastest supercomputer, has been in the works for two years.

Performance of these systems is ranked based on their speed in floating-point operations per second (FLOP/s), measured on the LINPACK benchmark, which involves the solution of a dense system of linear equations.  The Sequoia, ranked fastest in the world in June of this year, achieved over 16 petaflops (1.6 × 1016 flops); the new Titan system is rated at 27 petaflops (2.7 × 1016 flops).

The Titan system uses a hybrid architecture that includes both conventional 16-core Opteron 6274 CPUs, and NVIDIA Tesla K-20 GPUs.  It has a total of 18,688 compute nodes, each containing a GPU and a CPU, for a total of 299.008 CPU cores.  The system also has more than 700 terabytes ( 7 × 1014 bytes) of memory.  (Apparently 640KB is no longer enough for anyone.)  The hybrid architecture results in better energy efficiency; Titan gets about ten times the performance of its predecessor at ORNL, the Jaguar, at less than 30% more electricity consumption.  It is, however, rather large, requiring 4,352 square feet of floor space.

James Hack, Director of ORNL’s National Center for Computational Sciences, said “Titan will allow scientists to simulate physical systems more realistically and in far greater detail. The improvements in simulation fidelity will accelerate progress in a wide range of research areas such as alternative energy and energy efficiency, the identification and development of novel and useful materials and the opportunity for more advanced climate projections.”

The ORNL press release is here.


%d bloggers like this: