Questionable science

The pressure to publish in science and academia is intense, so much so that it is not uncommon to find careless mistakes in a paper, or to have a result pushed through to a journal before it has been thoroughly vetted. The peer review system is in place to try to prevent these rushed, over-reaching or imprecise findings, however negligent research does get through, particularly now with open-access or online only journals. David Colquhoun recently wrote a piece for The Guardian raising this concern and citing breakdowns in peer review as the source of these flawed publications. However, the problem is not only in these small niche journals, but is pervasive in scientific research and publishing as a whole. Nature Neuroscience printed a frightening article last month about the magnitude of statistical error in psychological and neuroscience studies. The errors are rampant, biasing results to appear significant in nearly half of all studies published last year in top-tier journals, including Science, NatureThe Journal of Neuroscience, and Neuron. These faulty statistics come from a tendency to compare significance, or p-values, separately in experiment versus control conditions. On the surface, this sounds like exactly how these tests are supposed to work, but on closer inspection this practice grossly overestimates the significance of an experimental effect. For example, if mice who receive a test anti-anxiety drug significantly reduce their freezing responses (an indicator of anxiety) with an effect of p < 0.05, and mice who receive a control do not significantly produce a similar effect (p > 0.05), one would be tempted to claim that this anxiolytic drug worked, having a greater effect on the targeted symptoms than the placebo and reducing anxiety in test mice. However, the true comparison that must be made for this claim to be verified is whether the test mice had significantly less anxiety symptoms compared to the control mice,  i.e., not drug vs. baseline compared to control vs. baseline, but drug vs. control.

The authors of this review, led by Dr. Sander Nieuwenhuis, make the point that although 0.05 is a widely accepted validation criteria for significance, it is also generally thought to be a potentially deceptive one. In the example above, the control mice could have a p-value of 0.051, while the experiment condition registered as 0.049. Technically, one condition had a significant effect while the other did not. However, if you were to compare the two effects directly, it is highly dubious that this comparison would remain significant. This false effect occurred in all types of studies, including pharmacological and behavioral group comparisons, and neuroimaging research. It was also not limited to cognitive or behavioral fields, occurring in cellular and molecular neuroscience articles as well.

This discovery may help to explain a phenomenon recently described in Nature Reviews Drug Discovery by scientists at the Bayer laboratories in Germany. The authors reported that over 50% of all experimental drug trials conducted at academic research institutions were unable to be replicated in private clinical trials. They list both statistical errors and improperly reviewed research as potential sources for this irreproducibility. In addition, they also cite a pressure to only report significant positive findings, rather than a null result, as an effect biasing research output. This trend has been lamented by purist researchers for years, however it is difficult to combat the urge to publish news-worthy headlines; after all, nobody wants to read "We didn't find anything".

Retractions of major papers in journals are rare but do occur when enough protest has been made about over-reaching claims or questionable research methods. Unfortunately though, the information has typically already been disseminated to the public, and there is rarely a press release for a retraction as there are for publications. Thus the intellectual damage, particularly to non-professionals who stumble across the information through standard media outlets, is already done. Such was the case in the vaccine-autism link that was first proclaimed in the 1990s and set off panic among parent groups. The original research paper that made this claim was found to be largely fraudulent, yet the resonating impact of this paper has lasted now for nearly two decades. Fortunately there have been recent attempts to remedy flaws like these in science by keeping researchers honest and holding them accountable to their over-zealous, inaccurate or unproven claims. This includes blogs like Retraction Watch, which has gained notoriety for publishing posts of recently retracted findings, reminding the public of the potential for error in science and following up on errant claims made by researchers.

As so many things do, this distressing trend of questionable science stems from a push for prestige and money. Publishing in top-tier journals, or even publishing at all, is the current marker of professional success and is an expectation for researchers, particularly in academia, who want to advance their careers. In addition to professional progression, with publications comes grant funding, significant sources of money for both institutions and individuals. The commercialization of science and research is troubling and suggests that a dramatic overhaul of the system might be in order, though that is certainly easier said than done.

Dana Smith

PhD student in Experimental Psychology at the University of Cambridge