## Statistical Twisters

May 22, 2013

During yesterday evening’s ABC World News program, which was largely taken up with coverage of the tornado disaster in and around Moore OK, there was a segment on a 90+ year old resident who had lost her house to a tornado for the second time.   (The first time was in May 1999, when a similar strong twister hit Moore.)  There was then a statement, which caught my attention, that the odds against this happening were “100 trillion to 1”.

Now, those are pretty long odds.  One hundred trillion is 100 × 10¹²; by way of comparison, it is about twenty times the estimated age of the universe, since the Big Bang, measured in days.  If the odds are true, we are talking about a really rare phenomenon.

Thinking about the question this morning, I decided to double-check the report — perhaps I had just misunderstood the number that was being quoted.  I found a report on the ABC News site, which actually made the whole odds business more questionable:

A recent tornado probability study, published by Weather Decision Technologies, predicted the odds of an E-F4 or stronger tornado hitting a house at one in 10,000.

That same study put the odds of that same house getting hit twice at one in 100 trillion.

It is almost impossible to imagine how both these probability assessments could be correct, or even reasonable guesses.  If the odds against the house being hit once are one in 10,000 (probability 0.0001) , then, if tornado hits are independent, the probability of a house being hit twice is (0.0001)², or odds of 1 in 100 million.  That would make the quoted odds (1 in 100 trillion) off by a a factor of one million.  Of course, if tornado hits are not independent, then my calculations are inappropriate.  But for the numbers to work as quoted, the first hit would have to, in effect, provide truly enormous protection against a second hit.  (If the odds against the first are one in 10,000, then the odds against the second must be truly astronomical to produce cumulative odds of one in 100 trillion.)

Now, I don’t actually believe that tornado hits are independent.  Tornadoes certainly do not occur uniformly across the world, or even across the United States.  The NOAA Storm Prediction Center’s Tornado FAQ Site has a map highlighting “tornado alley”, the area where most significant tornadoes occur.  Although a tornado may, in principle, occur almost anywhere, you are considerably more likely to encounter one in Kansas or Oklahoma than you are in northern Maine or the upper peninsula of Michigan.

This question of independence is directly relevant to the news segment I mentioned at the beginning; it turns out that the unfortunate lady who has lost two houses built the second one on the same site as the first one, destroyed in 1999.  If the odds are affected at all by location (as they seem to be, at least “in the large”), then this was not, perhaps, the best possible choice.

I’ve griped before about the widespread ignorance of journalists and others when it comes to statistical information.  I have tried to find a copy of the  “Tornado Probability Study” mentioned in the quote above, so far without success.  I’ll keep trying, and report on anything I discover.  If I’m missing something, I’d like to know; if the probabilities are just made up, I’d like to know that, too.

## Detecting Election Fraud

October 3, 2012

As I’m sure readers know, this is a presidential election year in the United States.  (There are also elections for members of the House of Representatives and for some Senate seats.)   A side attraction in this year’s festivities is an ongoing political and legal tussle over the attempts, in some states, to impose more stringent identification requirements for those wishing to vote.  Proponents of these measures argue that they are necessary to prevent election fraud.   Actual evidence that this occurs, at least in the form (impersonation) that these measures would address, is in remarkably short supply.  Nonetheless, it does prompt the question: how can election fraud be detected?

The Proceedings of the National Academy of Sciences recently published a paper [abstract, full PDF download available], that takes an interesting new approach to this question.   The authors (Peter Klimek, Yuri Yegorov, Rudolf Hanel, and Stefan Thurner) write:

Democratic societies are built around the principle of free and fair elections, and that each citizen’s vote should count equally. National elections can be regarded as large-scale social experiments, where people are grouped into usually large numbers of electoral districts and vote according to their preferences. The large number of samples implies statistical consequences for the polling results, which can be used to identify election irregularities.

There have, of course, been previous studies that used statistical methods to try to uncover election fraud.  (I wrote about an analysis of Iranian election results, back in 2009, that used Benford’s Law, and similar techniques.)  The authors of the current paper argue that these generally have two drawbacks.

• They can provide a strong suggestion of fraud; however, since there is no theory of how particular types of fraud (e.g., ballot box stuffing) should change the results, they are far from conclusive.
• The results may vary depending on the degree to which the election data are aggregated (for example, the size of the voting precincts).

To address these concerns, they develop a parameterized model for the distribution of several election variables (e.g., voter turnout).  This allows them to predict the effect that ballot stuffing should have on the results.  In particular, they find that the distribution of results should have higher kurtosis† in a fraudulent election.   When tested against real-world election data, the model seems to work well across a range of aggregation levels.

This is a fascinating area of research.  The success of techniques of this kind (which have also been used to help spot financial fraud) depends, at least in part, on people’s lack of intuition about seemingly random phenomena, and on their general inability to construct a convincing fake.

____

Kurtosis is a statistical measure of the shape of a distribution, in particular the degree to which it is “peaked” around the mean.  (The word comes from the Greek κυρτόσ, meaning “bulging”.)  The most common measure, based on the fourth moment of the distribution, is “excess kurtosis” relative to the normal (Gaussian) distribution.  A distribution with positive excess kurtosis has a narrower, more acute central “hump” and fatter tails than the normal distribution, and is called leptokurtic; the Poisson distribution is an example.  A distribution with negative excess kurtosis, called platykurtic, has a wider, less pronounced hump, and thinner tails; the uniform distribution is an example.