- Full Text
- Scholarly Journal
The Statistical Crisis in Science



Full text preview
Data-dependent analysis-a "garden of forking paths"- explains why many statistically significant comparisons don't hold up.
There is a growing realization that reported "statistically significant" claims in scientific publications are routinely mistaken. Researchers typically express the confidence in their data in terms of p-value: the probability that a perceived result is actually the result of random variation. The value of p (for "probability") is a way of measuring the extent to which a data set provides evidence against a so-called null hypothesis. By convention, a p-value below 0.05 is considered a meaningful refutation of the null hypothesis; however, such conclusions are less solid than they appear.
The idea is that when p is less than some prespecified value such as 0.05, the null hypothesis is rejected by the data, allowing researchers to claim strong evidence in favor of the alternative. The concept of p-values was originally developed by statistician Ronald Fisher in the 1920s in the context of his research on crop variance in Hertfordshire, England. Fisher offered the idea of p-values as a means of protecting researchers from declaring truth based on patterns in noise. In an ironic twist, p-values are now often manipulated to lend credence to noisy claims based on small samples.
In general, p-values are based on what would have happened under other possible data sets. As a hypothetical example, suppose a researcher is interested in how Democrats and Republicans perform differently in a short mathematics test when it is expressed in two different contexts, involving either healthcare or the military. The question may be framed nonspecifically as an investigation of possible associations between party affiliation and mathematical reasoning across contexts. The null hypothesis is that the political context is irrelevant to the task, and the alternative hypothesis is that context matters and the difference in...