As I’m sure readers know, this is a presidential election year in the United States. (There are also elections for members of the House of Representatives and for some Senate seats.) A side attraction in this year’s festivities is an ongoing political and legal tussle over the attempts, in some states, to impose more stringent identification requirements for those wishing to vote. Proponents of these measures argue that they are necessary to prevent election fraud. Actual evidence that this occurs, at least in the form (impersonation) that these measures would address, is in remarkably short supply. Nonetheless, it does prompt the question: how can election fraud be detected?
The Proceedings of the National Academy of Sciences recently published a paper [abstract, full PDF download available], that takes an interesting new approach to this question. The authors (Peter Klimek, Yuri Yegorov, Rudolf Hanel, and Stefan Thurner) write:
Democratic societies are built around the principle of free and fair elections, and that each citizen’s vote should count equally. National elections can be regarded as large-scale social experiments, where people are grouped into usually large numbers of electoral districts and vote according to their preferences. The large number of samples implies statistical consequences for the polling results, which can be used to identify election irregularities.
There have, of course, been previous studies that used statistical methods to try to uncover election fraud. (I wrote about an analysis of Iranian election results, back in 2009, that used Benford’s Law, and similar techniques.) The authors of the current paper argue that these generally have two drawbacks.
- They can provide a strong suggestion of fraud; however, since there is no theory of how particular types of fraud (e.g., ballot box stuffing) should change the results, they are far from conclusive.
- The results may vary depending on the degree to which the election data are aggregated (for example, the size of the voting precincts).
To address these concerns, they develop a parameterized model for the distribution of several election variables (e.g., voter turnout). This allows them to predict the effect that ballot stuffing should have on the results. In particular, they find that the distribution of results should have higher kurtosis† in a fraudulent election. When tested against real-world election data, the model seems to work well across a range of aggregation levels.
This is a fascinating area of research. The success of techniques of this kind (which have also been used to help spot financial fraud) depends, at least in part, on people’s lack of intuition about seemingly random phenomena, and on their general inability to construct a convincing fake.
____
† Kurtosis is a statistical measure of the shape of a distribution, in particular the degree to which it is “peaked” around the mean. (The word comes from the Greek κυρτόσ, meaning “bulging”.) The most common measure, based on the fourth moment of the distribution, is “excess kurtosis” relative to the normal (Gaussian) distribution. A distribution with positive excess kurtosis has a narrower, more acute central “hump” and fatter tails than the normal distribution, and is called leptokurtic; the Poisson distribution is an example. A distribution with negative excess kurtosis, called platykurtic, has a wider, less pronounced hump, and thinner tails; the uniform distribution is an example.