Most of us have seen the crime drama shows on television, like the CSI series, in which forensic science is used to identify the Bad Guys. And, as in real life, the “gold standard” of forensic evidence is felt to be matching DNA, since it is claimed that the probability of a match by chance (the so-called Random Match Probability, or RMP) is virtually zero.
The New Scientist this week has an article about a letter, signed by 41 forensic scientists and lawyers in the US and the UK, and published in the December issue [subscription required] of Science, arguing that the underlying assumptions about the probability of random matches in DNA data have never been validated against a large data base. The letter also asks that the US CODIS DNA database, run by the FBI and containing more than 7 million DNA profiles, be made available for this purpose. Similar previous requests have been turned down by the FBI on privacy grounds.
In order to understand the issue, a little background on DNA forensics may be helpful. The portion of our DNA that is in some ways the most interesting and important — the part that makes us human rather than hamsters or goldfish — is not useful for identification, because it is (nearly) the same for everyone. So DNA forensics relies on the portion of the genetic sequence that, as far as anyone knows, has no particular function: the “non-coding” or “junk” DNA.
Before a match can be sought, a profile is generated from a DNA sample by analysing specific locations on the chromosomes, called loci, and looking at short sections of non-coding DNA, known as short tandem repeats (STRs), which vary between individuals. An RMP is then arrived at using the estimated frequencies of these STRs, or alleles, at all the loci investigated.
The estimates of the RMP are themselves based on estimates about the frequency of various non-coding alleles in the population. But some of these frequency estimates were made years ago, when DNA forensics was a new tool, and are based on limited sets of data. There is some suspicion that some of the frequency estimates may be significantly wrong, at least for certain sub-groups within the overall population.
But there are signs that these studies did not capture the true frequencies of certain alleles in some populations, which could mean that the RMPs presented in court are wrong. “When you look at real offender databases you see that there are shocking differences between what you actually see and what you would expect to see,” says Krane. [Dan Krane, of Wright State University in Dayton OH, is the lead author of the letter.]
Evidence that some of the RMPs are misestimated was found in a 2005 study of a state DNA database in Arizona, and subsequent studies in Illinois and Maryland. In addition to the possibility of incorrect frequency estimates, the study also raised questions about how statistically independent the occurrence of various alleles really is. And of course there is always the possibility of data entry errors.
Although DNA evidence based on a large number of alleles (forensic scientists usually try for 13 in the US) is almost certainly reliable, even if there are some errors in estimation, many real world cases rely on much less satisfactory evidence. Perhaps only a small or contaminated sample of DNA can be retrieved in the course of forensic examination
“I’ve been involved in cases where these are 1-in-67 or 1-in-83,” says signatory Bill Shields of the State University of New York at Syracuse. “If those numbers are off by 50 per cent, then that could make a big difference to a jury.”
At present, the prospects for the release of the CODIS data are not promising, although it is slightly encouraging that the FBI acknowledges the validity of some of the questions raised in the letter:
Director of the FBI Laboratory, Christian Hassell, says he appreciates the concerns the Science letter raises. “We are exploring ways to investigate some of the topics,” he adds. But he has turned down the request for access, citing concerns about genetic privacy.
DNA data is used in medical research under appropriate privacy safeguards, and it should not be beyond the wit of man to devise suitable safeguards for the CODIS data. If the DNA-related probability estimates are wrong, they should be put right to avoid gross miscarriages of justice. If they are correct, then public confidence in the system will be enhanced. Either outcome seems like a win for justice.