Abstract
<p><strong>Background: </strong>Scientific fraud is an increasingly vexing problem. Many current programs for fraud detection focus on image manipulation, while techniques for detection based on anomalous patterns that may be discoverable in the underlying numerical data get much less attention, even though these techniques are often easy to apply. </p><p><strong>Methods: </strong>We applied statistical techniques in considering and comparing data sets from ten researchers in one laboratory and three outside investigators to determine whether anomalous patterns in data from a research teaching assistant (RTS) were likely to have occurred by chance. Rightmost digits of values in RTS data sets were not, as expected, uniform; equal pairs of terminal digits occurred at higher than expected frequency (> 10%); and, an unexpectedly large number of data triples commonly produced in such research included values near their means as an element. We applied standard statistical tests (chi-squared goodness of fit, binomial probabilities) to determine the likelihood of the first two anomalous patterns, and developed a new statistical model to test the third.</p><p><strong> Results: </strong>Application of the three tests to various data sets reported by RTS resulted in repeated rejection of the hypotheses (often at p-levels well below 0.001) that anomalous patterns in those data may have occurred by chance. Similar application to data sets from other investigators were entirely consistent with chance occurrence.</p><p><strong>Conclusions: </strong>This analysis emphasizes the importance of access to raw data that form the bases of publications, reports and grant applications in order to evaluate the correctness of the conclusions, and the importance of applying statistical methods to detect anomalous, especially potentially fabricated, numerical results. </p>
Reference28 articles.
1. Science publishing: the trouble with retractions;Van Noorden R;Nature,2011
2. Misconduct accounts for the majority of retracted scientific publications;Fang F, Steen R, Casadevall A;Proc Natl Acad Sci U S A,2012
3. What’s in a picture? The temptation of image manipulation;Rossner M, Yamada K;J Cell Biol,2004
4. Are these data real? Statistical methods for the detection of data fabrication in clinical trials;Al-Marzouki S, Evans S, Marshall T, Roberts I;BMJ,2005
5. Deriving chemosensitivity from cell lines: forensic bioinformatics and reproducible research in high-throughput biology;Baggerly K, Coombes K;Ann Appl Statist,2009