From Wikipedia - Reading time: 12 min
This article is written like a personal reflection, personal essay, or argumentative essay that states a Wikipedia editor's personal feelings or presents an original argument about a topic. (March 2025) |
This article is an orphan, as no other articles link to it. Please introduce links to this page from related articles; try the Find link tool for suggestions. (March 2025) |
It has been suggested that this article be merged into Scientific integrity. (Discuss) Proposed since March 2025. |
Questionable research practices (QRPs) are behaviors undertaken during the design, implementation, analysis and dissemination of scientific research that are unscientific. These range from grey area behaviors that tweak or oversell findings to outright violations of scientific ethics if not laws like falsifying data. These behaviors deviate from ideal scientific conduct.[1] QRPs increase uncertainty in scientific research. Generally QRPs increase the likelihood of false positives[2] and contribute to the replication crisis across scientific fields.
Studies of QRPs thus far confirm that they are widespread.[3][4] There are debates over whether "questionable" includes outright data or result faking or not,[5] but many studies of QRPs include these as the most extreme types. A Google Ngram search reveals that the word entered scientific usage in the late 1980s and exploded into mainstream science after 2010. The 2010 decade began with some spectacular scandals that probably set this off. This unfolded at first in psychology where one of the more well known and respected professors was found to have faked an entire career of research and in one of the most highly respected journals another psychologist published[6] scientific evidence of extra-sensory perception (ESP).
These include any actions that skew research outcomes or interpretations by violating standard methodological or transparency norms. In some circles they encompass blatant fraud, such as faking data or images.[7] Across all definitions, they primarily involve subtle practices often related to analysis and reporting that can mislead without outright lying. For example, intentionally excluding data points or cases after seeing their effect on results, or failing to report entire studies that yielded negative results
Such behaviors inflate evidence in favor of a hypothesis spuriously and undermine the integrity of findings. The term gained wider usage after a 2012 study by Leslie K. John, George Loewenstein, and Dražen Prelec, who surveyed researchers about these behaviors and found that over half of active psychologists surveyed, reported engaging in at least one QRP during their career.[3] By surveying researchers about their own practices and those of their peers, they used a Bayesian truth serum method to conclude modestly that 9% of psychologists likely faked data during their career. John et al. and others emphasized that the majority of QRPs, by virtue of being “questionable” rather than overtly fraudulent, allow considerable room for self-justification.
A researcher might convince themselves that dropping an outlier or tweaking a hypothesis post hoc is defensible, even if it biases the result. This capacity for rationalization makes QRPs insidious: many scientists may engage in them without feeling they are doing anything wrong. As a result, the prevalence of QRPs can be high even among otherwise honest, well-intentioned researchers.
QRPs cover a range of specific practices at different stages of the research process. The are predominantly known in research involving experiments and statistical analysis and reporting, but are equally possible in qualitative and ethnographic research albeit in different formats. The following is a general list of QRPs expanded from Peter M. Dahlgren's Chapter 2 in How Scientists Lie expanded to include fraud and qualitative questionable research practices.
Manipulating the research process to obtain certain results: P-hacking is among the most prominent QRPs with research involving statistical hypothesis testing with a p-value (probability value). Researchers re-run, readjust and reanalyze models until the p-value reaches a certain cutoff value. Although the name 'p-hacking' comes from the 'p' in p-value, it is used today in general to reference behaviors that manipulate the research process to achieve statistical effects that are 'significant'[8] - for example by tinkering with models or torturing original data. It could refer to intentionally searching for insignificant results or 'null hacking'.[9] What counts as a significant p-value is a product of the nature of the hypothesis, convention and/or power analysis. This behavior produces a result that is likely an outlier or unique occurrence, rather than a reliable scientific test of anything.
Manipulating the process of reporting after conducting research: Selectively disclosing or emphasizing only those results that are desirable, for example statistically significant and/or pointing in the 'right' direction. It also includes dropping carefully derived hypotheses in case results do not support them, as if they never existed. This act of cherry-picking can occur at the level of outcomes (reporting only significant dependent measures and ignoring non-significant ones), analyses (reporting only analyses that “worked”), or even entire studies (the so-called file drawer problem[10] or publication bias, where experiments that yielded null or undesirable results are never reported). By omitting inconsistent data, researchers create a biased picture of support for a hypothesis.
Manipulating the order of events in the research process: Hypothesizing after results are known (HARKing) is a method of explaining results post hoc. Either the results were something different than hypothesized or resulted from playing around with data exploratively. But in either case, the hypotheses are presented as if they were established a priori. This presents the results as confirmatory rather than exploratory. This can have grave policy implications and inflate the reliability of a given effect or theory.[11]
Scientific vetting of results submitted to journals tends to favor 'sexy' findings, those that will garner more attention and boost citation counts.[12][13] This leads to editors and peer reviewers to favor finds with large effect sizes, small p-values or generally surprising or shocking findings that garner attention. Bias in the process of selection of what gets published generates perverse incentives for researchers to p-hack, selectively report and HARK.[14]
Carving a single study up into the most possible publishable papers. This dilutes the scientific findings pool by generating an appearance that a single hypothesis has more studies supporting it than there actually are.[15]
The practice of disproportionally citing studies that support an author's theory or hypothesis.[16] This generates a distorted picture of the existing scientific literature and can generate bias in what studies are cited and read.
Any direct manipulation of results. Not in any way defensible. For example inventing entire datasets, adding cases or changing the values of existing cases are examples of this. The Stapel and LaCour cases involved entirely fabricated experiments and resulting data, whereas the Gino and Stewart retractions involved manipulation of original data. Fraud includes various formats of data-faking. Many retractions occur because images were AI generated or manipulated.[17]
Often it is difficult to determine if researchers consciously manipulated the research practice, or were trained to behave in questionable ways. Sometimes researchers are simply sloppy. This appears to be the case in the Reinhart-Rogoff retraction which seems to have been a product of copy-pasting errors. Sometimes researchers rush and introduce errors into their workflows that render the results unreliable.[18]
Early concerns about what we now call QRPs can be traced back decades, but awareness grew substantially in the 2000s and 2010s. In clinical research and psychology, scientists began noticing that published findings often seemed too good to be true, and that many results could not be reproduced. In 2005, epidemiologist John Ioannidis published a provocative article titled “Why Most Published Research Findings Are False,” arguing that a combination of QRPs like selective reporting and publication bias meant that much of the published literature was unreliable.[19] Although a thought experiment, the arguments caught hold. Around the same time, surveys were documenting questionable behaviors among researchers: for example, a 2005 survey of 3,247 U.S. scientists found that one-third admitted to having engaged in at least one dubious practice in the past three years.[1] These early warnings set the stage for more systematic scrutiny of research practices. In psychology, a series of events in the early 2010s dramatically increased attention to QRPs. In 2011, Joseph Simmons, Leif Nelson, and Uri Simonsohn published a landmark paper titled “False-Positive Psychology,” which demonstrated through simulations and actual experiments that common flexible analytic practices can lead to spurious significance. When coupled with the aforementioned Bem and Stapel scandals, psychology was at a major turning point.[20]
The awareness of QRPs spread across the social and behavioral sciences. For example, in Political Science, the American Journal of Political Science began testing all code for computational reproducibility in 2015 for accepted articles using statistical analysis. None of it reproduced at first. This coupled with the aforementioned LaCour scandal had a deep impact in this discipline.[21] In Economics, work by Abel Brodeur and colleagues made painfully clear that p-hacking and publication biases were undermining the reliability of the discipline.[22][23][24]
By the mid-2010s, concerns about QRPs had crystallized into what is now called the replication crisis. Systematic replication projects were undertaken to assess how many findings would hold up if re-tested independently. The results were sobering: in 2015, the Open Science Collaboration repeated 100 published psychology experiments and found that only ~36% produced a statistically significant result again, and the effect sizes were on average about half of the originals.[25] Similar alarm bells rang in other fields – for instance, in cancer biology, pharmaceutical companies reported they could not replicate many high-profile preclinical studies. These fields were grappling with this in serious ways because their findings literally shaped life and death for humans depending on medical treatments.
A landmark survey by John, Loewenstein, and Prelec (2012) shed light on QRPs within Psychology, although there is little reason to believe other disciplines that work with experiments and/or statistical analysis are so different.[26] Some specific behaviors were strikingly common: for example, around 66% of surveyed psychologists acknowledged selectively not reporting all of a study’s dependent measures (variables) in publication, and roughly 50% admitted to only reporting studies that gave the desired result (“cherry-picking” successful experiments). Using Bayesian truth serum methods they estimated that 94% of researchers engaged in at least one QRP during their career.
Metascience (the scientific study of science itself) and replication research exploded in scope to study these problems.[27] By the late 2010s, terms like “p-hacking” and “HARKing” had entered mainstream discussions of research reliability, and journals and funding agencies began exploring ways to curb such practices. In summary, what began as scattered concerns in the early 2000s evolved into a full-blown crisis of confidence in research findings, with questionable research practices identified as a key underlying cause. This realization has driven a reform movement aiming to improve transparency and reproducibility in science. Although there are debates over the severity of the 'crisis' the metascience is clear that science is not as reliable, and scientists sometimes not as trustworthy as the public, government officials and other scientists like to think.[28]
QRPs are often driven by the incentive structure of academia and science – in other words, the pressures and rewards that researchers face. An academic scientist’s career success depends heavily on publishing research, particularly in high-impact journals, which in turn favor novel and statistically significant findings.[29] This creates a strong motivation to obtain positive results and avoid null findings. Experiments or analyses that “don’t work” (i.e. produce non-significant results) are less likely to be published due to publication bias, leading researchers to perceive such outcomes as failures. The adage “publish or perish” encapsulates this environment: to secure jobs, promotions, and grants, scientists feel pressure to publish frequently and in prestigious venues, which typically means showing exciting, confirmatory results. As Nosek and colleagues noted in 2012,[29] disciplinary incentives encourage decisions that produce positive outcomes and gloss over negative or inconclusive results.
Gary King pointed out in his 'How Not to Lie With Statistics' paper that "often, we learn each others' mistakes rather than learning from each others' mistakes.".[30] We cannot entirely blame researchers for simply doing things the way they are taught. This means that institutions, supervisors, curricula and publishing outlets play active roles in promoting scientific norms favoring QRPs.
{{cite journal}}: CS1 maint: unflagged free DOI (link)
{{cite journal}}: CS1 maint: unflagged free DOI (link)
{{cite journal}}: CS1 maint: unflagged free DOI (link)
{{cite journal}}: CS1 maint: unflagged free DOI (link)
{{cite journal}}: CS1 maint: unflagged free DOI (link)
{{cite journal}}: CS1 maint: unflagged free DOI (link)
{{cite journal}}: CS1 maint: unflagged free DOI (link)