December 17, 2020
The Range Report is back after a little hiatus—(as are “Lightning Round” links to interesting articles, at bottom)—and what better place to jump back in than with one of the report’s regular themes: B.S. detection. This report is co-written with my attorney brother, Daniel Epstein, who has worked on cases involving forensic evidence. Without further ado:
Those shell casings came from the gun found in the defendants’ car. That was, in paraphrase, the testimony of a forensic expert in a case that resulted in the conviction of two young men in Brooklyn in 2009. We know about that case because one of us (David) was on the jury. The expert gave the impression that the possibility of a mismatch was infinitesimally small. The defense didn’t even bother to argue that point. And there wasn’t a single question asked about the forensic testimony during jury deliberations. That is how these things often go. Expert testimony is treated as gospel. And that is what makes a new scientific paper so disturbing, and deserving of much more media attention.
The paper, “(Mis)use of Scientific Measurements in Forensic Science” by Itiel Dror and Nicholas Scurich, highlights a gaping flaw in how firearm and fingerprint comparisons are validated. Validated, in this context, means how forensic disciplines are tested to assess their reliability and the likelihood of producing false results.
For firearms and fingerprints, this seems simple: Give experts fingerprints from known fingers, or shell casings ejected from known guns, and see if they can discern the matches from the non-matches. And that’s partly what happens. But there’s a catch, highlighted in this new paper. When forensic experts are tested in this way, they are generally allowed to answer that they’ve detected a match, a non-match, or that their findings are inconclusive. According to the scoring system, “inconclusive” is either a correct answer or doesn’t count at all. “Inconclusive” is never counted as incorrect.
Let’s put this in perspective.
Say you’re a forensic scientist, and I give you a 10-item test. You answer one question “match,” but it was not a match, so you get it wrong. You aren’t sure about the next nine questions, so you answer “inconclusive.” Your score? Ninety percent without having gotten a single question correct. Or, let’s say the “inconclusive” answers get dropped entirely. In that case, you’d just have to identify the easiest question, answer it correctly, then choose “inconclusive” for the other nine—voila, 100 percent accuracy. I know what you’re thinking: If only the tests I took in school had been scored like the ones some forensic experts take.
How big a problem is this in reality? Dror and Scurich highlighted one study in which the same fingerprint experts were tested on the exact same fingerprints seven months apart. Ideally, the same expert would arrive at the same conclusion when examining the same fingerprints. In fact, 10 percent of the time, the same expert came to a different conclusion about the same exact print. Either the first or second conclusion (or both) about the same prints has to be wrong. And yet, if one of those answers was “inconclusive,” it wasn’t scored as an error. As you can imagine, the difference between an expert testifying that fingerprints exclude a defendant (or don’t) versus saying that the prints are inconclusive is enormously important in a criminal case.
In another study, different experts in firearms forensics were given the same shell casings to examine, but sometimes reached different conclusions. That is, one expert said that a particular shell definitely matched the gun in question, while another expert examining the same shell said it was inconclusive. But because “inconclusive” was counted as a correct answer, the study reported a 0 percent error rate even though forensic experts sometimes reached different conclusions about the exact same shell casings. That is probably not what a jury has in mind when they hear that experts in a particular forensic domain are 100 percent accurate.
To make matters more worrisome, Dror and Scurich noted, forensic examiners “resort to making more inconclusive decisions during error rate studies than they do in casework.” In other words, they are more conclusive when real lives are on the line. In another study of firearms forensics, Dror and Scurich noted, “98.3% of the decisions were inconclusive, leaving a maximum ceiling of only 1.7% as potentially the highest possible error rate.” The author of that study, an FBI firearms examiner himself, concluded that forensic experts are extremely accurate, while also pointing out that the test takers “understood they were participating in a blind validation study, and that an incorrect response could adversely affect the theory of firearms identification.” In simpler terms: They were incentivized to choose “inconclusive” lest they get answers wrong and diminish the reputation of their field in legal proceedings. Paradoxically, the overuse of “inconclusive” on validation tests allows forensic experts to claim in court that their discipline is more conclusive than it really is.
In the test described above, all of the shell casings could be conclusively determined to be from the “same source” or “different source,” so “inconclusive” was, in fact, a wrong answer. That doesn’t mean forensic experts should never answer “inconclusive.” It may be the only responsible answer in cases when the evidence is truly inconclusive. Rather, there should be better validation testing methods and scoring practices, or at least more accurate ways of conveying the test results, so that judges and juries understand what these accuracy reports actually mean.
Dror and Scurich aren’t just commenting on these misunderstandings as hypothetical problems. Both have provided expert testimony in criminal cases, including in death penalty cases, and including on opposite sides of at least one case. They collaborated for this study because they know how powerful—and dangerous—it can be when a jury hears that experts in a forensic discipline are 100 percent accurate. And they know that these disciplines are not, in reality, infallible, and that jurors deserve to know that.
We want to highlight one final study that Dror and Scurich cite in their paper. This study included 2,178 comparisons in which shell cartridge casings were not produced by the firearm in question. Forensic experts accurately assessed 1,421 of those and made 22 false-positive identifications. The remaining 735 responses were “inconclusive.” How big a factor should those inconclusives be when we think about the results of this study? (And since the Range Report promotes opportunities to use simple calculation for B.S. detection, think about this one for a minute before reading on.)
Those inconclusives are a really big deal. In this case, the study counted “inconclusive” as a correct answer, and so reported a 1 percent error rate in identifying different-source cartridges. (That is, 22/2,178.) Had the study left the inconclusive responses out, the reported error rate wouldn’t be much different: 22/(2,178 – 735), or 1.5 percent. But let’s say instead that “inconclusive” was counted as an error. Then the calculation is (735 + 22)/2,178, or a 35 percent error rate. So how accurate were those experts in identifying non-matches? Their error rate was somewhere between 1 percent and 35 percent, depending on how you deal with inconclusives. How accurate would they have been if they hadn’t been allowed to choose inconclusive at all? We have no idea—and that’s a huge problem. Ultimately, these tests are constructed so that forensic examiners can choose the questions on which they will be scored. In this example, one in five examiners answered “inconclusive” for every single comparison, giving them perfect scores. Again, if only the tests you took in school had worked that way.
The point is, the potential error rate is much, much higher than the reported error rate. No matter how you choose to deal with inconclusives, we think that claiming a 0 percent or 1 percent error rate is misleading. If you were on trial for a firearms crime you didn’t commit, and a forensic expert failed to rule out your gun, you’d want the jury to know the real potential error rate.
There is no way to know if that 2009 case in Brooklyn—the one where David was a juror—would have turned out differently if the validation issue had come up during the trial or in jury deliberations. But we’re all better served when we’re given an accurate understanding of the science served up in our courtrooms. And the reality is, these validation studies are obscuring the truth about forensic science rather than revealing it.
LIGHTNING ROUND
–Patients who undergo emergency surgery on the surgeon’s birthday are a bit more likely to die. “These findings suggest that surgeons might be distracted by life events that are not directly related to work.” (File surgeons under “also human.”)
–A 2020 list of groundbreaking (pun intended) discoveries by archaeologists and anthropologists.
–“Core values, known as ‘kakun’, or family precepts, have guided many companies’ business decisions through the generations. They look after their employees, support the community and strive to make a product that inspires pride.” (h/t @mkonnikova)
–A Korean study of indoor diners showed that coronavirus transmission occurred quickly, and from far away, but only in a direct line of airflow.
-Apparently Bill Gates really liked Range. And it’s not every day someone reviewing my book opens with, basically, “When I was paired with Roger Federer…”
Thank you for reading. Until next time….
David
p.s. If you’d like to share this Range Report as a post, here it is. Previous Range Reports are here.
p.p.s. If you have a friend who might find this free newsletter thought-provoking, or if you’d like to improve the odds that the Range Report continues to exist, please consider sharing. They can subscribe here.