Rounding and Other Errors, Part 2

November 10, 2009

In my previous post on this subject, I talked about some of the potential pitfalls of computer calculations: inappropriate use of integer arithmetic, real rounding error, and representational limitations. Now I want to discuss, briefly, two remaining problem areas: loss of significance, and perverse formulations, which are particularly perilous in floating-point arithmetic.  There are two caveats I have to mention first: both of these areas are inter-related, as I think will become apparent; and to even attempt to cover them completely is far beyond the scope of this article.  Nonetheless, I’ll try to give at least an overview of the potential hazards.

4.  Loss of Significance

Despite the fact that programming languages may make floating-point arithmetic look like ordinary arithmetic (and FORTRAN even called the floating data type REAL), floating point calculations do not necessarily work according to the rules of arithmetic, as we all learned them for hand calculation.  There are a few different formats used for floating point numbers, but the all have the general form:

N EEE...EEE SSS...SSS

where each letter represents a bit. N is the sign bit, generally 0 for plus and 1 for minus.  The EEE… bits represent the exponent of the number (analogous to the power of 10 in scientific notation), and the SSS… bits represent the significant figures of the value (this is sometimes called the significand).  In a floating-point number in IEEE 754 double binary format, there is one sign bit, eleven exponent bits, and 52 significand bits, for a total length of 64 bits.

There are various details of exactly how the number is stored that we don’t need to worry about here.  The key thing to note are that the number is stored with finite precision, and that every floating-point value, contrary to FORTRAN’s implication, is a rational number.  In other words, over any given interval of real numbers,. the set of possible floating-point values is a “grainy” subset of rational numbers; no irrational number can possibly be represented precisely.

The finite precision of machine arithmetic means that some problems will require more precision to calculate accurately than the hardware can deliver.  For example, if we try the subtraction:

1,000,000,000,000,000,000
- 999,999,999,999,999,999

the correct answer is obviously 1, but we are unlike to get exactly that, even using double arithmetic.  We don’t worry about this kind of difficulty in manual computation, since it is clearly visible, but when compounded at gigaFLOP speeds, the resulting errors can be, um, embarrassing.

On a more subtle, and dangerous, note, the finite precision of machine arithmetic also means that there are some equations, for example, that are true mathematically, but not necessarily computationally.  For example, both of the following equations are true:

(x + y) (x – y) = x² − y²

sin² θ + cos² θ = 1

but you would be well advised not to depend on exact equality for the correct operation of your program.  (There are software packages, such as Mathematica, which can handle these correctly, because they have embedded “knowledge” of the underlying mathematics.)  This leads us naturally to the next topic.

5.  Perverse Formulations

One problem that has cropped up repeatedly in all kinds of software for calculations is what I am calling “perverse formulation”.  This usually occurs when a software developer gets the formula for doing some computation out of a textbook or handbook.  If the formula is one that was originally intended for manual calculation, the results can be embarrassing.    For example, early versions of the Lotus 1-2-3 spreadsheet had a bult-in function for calculating the standard deviation of a set of numbers.  However, when one used a sample of just three numbers:

1,000,000    1,000,001    999,999

the function returned a negative number, which is mathematically impossible by definition of the standard deviation.  (The correct answer is 1.)   The definition of the sample standard deviation is:

Definition of Sample Standard Deviationwhere \bar{x} is the sample mean, and n is the number of observations.  Note that computing the standard deviation using this “recipe” will require two passes through the data; the first pass is needed to compute the mean \bar{x}, and the second to compute the summation.  This is tedious for manual calculation, so there is an alternative formula that is mathematically equivalent:

Calculation Formula for Standard Deviation

Although this calculation is mathematically equivalent, it is not at all computationally equivalent when performed in finite precision.  If the dispersion of the sample is small relative to the magnitude of the mean, the second formula will require the subtraction of two large, nearly equal values, which is a classic prescription for losing significance.  (This was in fact the source of the Lotus 1-2-3 problem I mentioned earlier.)  The formula for the roots of a quadratic equation that you learned in Algebra I has a similar problem.

An in-depth discussion of this would take a book — in fact, there are several on my shelves.  A good starting point for further information, which contains many worked examples,  is Numerical Recipes in C (2nd edition), by Press, Teukolsky, Vetterling, and Flannery, ISBN 0-521- 43108-5.  (I believe there are also FORTRAN and Pascal “flavors” of the book.)   I hope at least that I have made you a bit wary of copying formulas without thinking.

A Closing Note

The large standard deviation formulas above were pasted in as graphics (you don;t want to know how).   Although WordPress is generally an excellent blogging platform, and supports embedded LaTeX math code, I could not manage to get it to recognize those formulas as formulas.  Still working on it — if I gain any insights, I’ll let you know.


Smart as … a Pig

November 10, 2009

I’ve written here a couple of times about some of the recent research that seems to show that animals, dogs in particular, have more intelligence than they are sometimes given credit for.  Now, it seems, it is the pigs’ turn.   The New York Times has a report on recent research carried out at the University of Cambridge that showed pigs were able to learn to use images in mirrors to learn about their environment.

Although it is not yet clear that pigs can recognize their own individual images in a mirror, as some apes and dolphins can, they do seem to be able to relate images in the mirror to their surroundings, and use those images to discover new information.  The researchers first gave a group of pigs a chance to get accustomed to the mirror:

They began by exposing seven 4-to-8-week-old pigs to five-hour stints with a mirror and recording their reactions. The pigs were fascinated, pointing their snouts toward the mirror, hesitating, vocalizing, edging closer, walking up and nuzzling the surface, looking at their image from different angles, looking behind the mirror.

Then, they placed food in a place that was accessible, but not directly visible, to the pigs; the food could, however, be seen in the mirror.  They compared the reactions of the pigs that had had a chance to learn about the mirror with a control group that had not:

On spotting the virtual food in the mirror, the experienced pigs turned away and within an average of 23 seconds had found the food. But the naïve pigs took the reflection for reality and sought in vain to find the bowl by rooting around behind the mirror.

The researchers attribute some of this to the fact that pigs, like dogs and some apes, live in social groups, even in the wild.  The genome of the domestic pig has recently been sequenced, and (not surprisingly) has very long segments in common with humans, despite the divergence of the two species about 100 million years ago.  Pigs have been used for some pharmaceutical testing for some time, because of their similarity to us in many ways.  (And, of course, we have all been bombarded with news of how the “swine flu” migrated from pigs to humans.)   They share some other traits with us, too.  Dr. Lawrence Schook, of the University of Illinois at Urbana-Champaign, is quoted in the Times article:

Pig hearts are like our hearts, he said, pigs metabolize drugs as we do, their teeth resemble our teeth, and their habits can, too. “I look at the pig as a great animal model for human lifestyle diseases,” he said. “Pigs like to lie around, they like to drink if given the chance, they’ll smoke and watch TV.”

Maybe the media ratings industry  isn’t telling us who all of the members of their surveyed TV audience are; it could explain a lot.


Apple Security Update 2009-006

November 10, 2009

Apple has released a large batch of updates for Mac OS-X versions 10.5 (Leopard) and 10.6 (Snow Leopard).  Many of these address serious security vulnerabilities, and I recommend applying the patches as soon as possible.  More information is available in the Apple support document; a broad range of software components is affected.  Both desktop and server versions of OS-X have patches.

These should be available through the normal Software Update mechanism, or from the Apple Download site.

Update Wednesday, November 11, 10:44

Brian Krebs at the Washington Post has an article on these updates at his “Security Fix” blog.


Microsoft Security Updates, November 2009

November 10, 2009

As expected, Microsoft today released six security bulletins to resolve a number of vulnerabilities for Windows and Microsoft Office components.  All versions of Windows, except for the recently released Windows 7, are affected by at least one Critical vulnerability.  The Microsoft Office vulnerabilities, which are rated Important, affect both Windows and Mac versions of the software, as well as the viewers for Excel and Word.  These updates should be available via the usual Windows Update mechanism.  Complete information is in the Security Bulletin Summary; it also includes download links.

According to Microsoft, the Windows patches will definitely require a system reboot, and the Office patches may require one.  I recommend that you install these, especially the Windows patches, as soon as possible.

Update, Tuesday, November 10, 16:35

As they customarily do, the folks at the SANS Internet Storm Center have posted their overview and evaluation of the impact of this month’s bulletins.

Update Wednesday, November 11, 10:48

Brian Krebs at the Washington Post has an article on his blog, “Security Fix”, about this month’s patches.  He points out one thing that I overlooked: one of the vulnerabilities [MS09-065], in the way Windows handles “embedded font” files in Web pages, can be exploited to implement “drive-by” attacks.  That is, the user’s machine can be compromised merely by viewing a toxic Web site with Internet Explorer.  Two morals: get those patches applied ASAP; and using Internet Explorer is dangerous to your system’s health.


%d bloggers like this: