## It’s the Counting that Counts

In some elections, it’s not the voting that counts, it’s the counting that counts.  — Anon.

Unless you have spent the last week or so in a cave, or marooned on an island, you have heard of the protests and controversy surrounding the recent election in Iran.  Although as a matter of policy I don’t post intentionally political comments here, I was interested to see two reports from people that have tried to analyze the reported results, using purely statistical methods, to see what, if anything, can be discovered.

The first, and more thorough, analysis was done by Professor Walter R. Mebane, Jr of the University of Michigan.  (He is a professor of political science and statistics, and has done significant research into techniques for detecting election fraud.)  He has put together an analysis[PDF], originally published on June 15, and subsequently augmented, in which he looks at official reported results, and subjects them to two types of statistical tests:

The first category of tests is based on the distribution of the digits (0-9) in the election results.  These tests rely on Benford’s Law, which describes the distribution of the digits in many actual sets of data.  Contrary to what you might believe, the distribution of the first digit (which will be 1-9) is not uniform, but is a logarithmic function, as shown in the chart below:

Distribution of Initial Digits by Benford's Law

As one progresses from the left-most (most significant) digit to the right, the distribution of digits becomes closer and closer to a uniform distribution.  The test is based on the observation that people who are making up numbers usually end up making them “too random” — they don’t follow Benford’s Law, and they tend to avoid some patterns, such as the same digit twice in a row, or consecutive digits, that should appear occasionally in truly random data.

Prof. Mebane’s second test looks at the pattern of data “outliers” compared to the overall election results.  If the results are legitimate, these should not exhibit any particular pattern.

I will not try to go through all the details of the analysis, since the paper is available and you can eliminate the middleman.  But I think it is worth repeating the summary result:

In general, combining the first-stage 2005 and 2009 data conveys the impression that while natural political processes contributed significantly to the election outcome, outcomes in many towns were produced by very different processes.

In short, although there is no conclusive evidence of fraud, some of the results are distinctly suspect.

The second analysis is reported in the Washington Post, and was carried out by Bernd Baber and Alexandra Scocco, both PhD candidates in political science at Columbia University.  They also analyze the officially-reported vote totals, in this case focusing on the low-order (least significant) digits of the reported numbers, to examine the degree of deviation from an expected uniform distribution.  (Recall that earlier we said that, as one moves from left to right in the number, the distribution of digits should become more uniform.)   It is worth noting their rationale for the test, which also applies to Prof. Mebane’s analysis:

Why would fraudulent numbers look any different? The reason is that humans are bad at making up numbers. Cognitive psychologists have found that study participants in lab experiments asked to write sequences of random digits will tend to select some digits more frequently than others.

Another way of saying this is that people try too hard to make the numbers “look right”, an error often compounded by their misunderstanding of what really looks right.  Baber and Scocco find that this analysis also indicates that some of the results are suspect:

Each of these two tests provides strong evidence that the numbers released by Iran’s Ministry of the Interior were manipulated. But taken together, they leave very little room for reasonable doubt. The probability that a fair election would produce both too few non-adjacent digits and the suspicious deviations in last-digit frequencies described earlier is less than .005. In other words, a bet that the numbers are clean is a one in two-hundred long shot.

I can’t vouch for the accuracy of their statistics, since this is a news story, not a technical paper, and the details of the data and methodology are not reported.  But again, it appears that at some of the results are fishy.

I can’t say that I am particularly surprised by this result, but it is interesting that there is at least some objective evidence that supports the claims of election fraud.   Perhaps the fact that so many politicians are mathematically illiterate does have its positive aspects.

##### Update, Monday 6/22, 20:46

There have now been some more specific allegations of voting “irregularities” reported.  According to an article in the New York Times, the opposition candidates claim that in a number of areas, the number of votes recorded significantly exceeded the number of registered voters.   The official response was not exactly reassuring:

“Statistics provided by the candidates, who claim more than 100 percent of those eligible have cast their ballot in 80 to 170 cities are not accurate — the incident has happened in only 50 cities,” Mr. Kadkhodaei said.

Only 50 cities — well, no problem then.