S2 Ep.17 — Trust the Science: The Replication Crisis and AI in Research | Switched On by Neal Lloyd

⚡ SWITCHED ON · SEASON 2 · REPLICATION CRISIS · PEER REVIEW · AI IN RESEARCH · PUBLICATION BIAS · P-HACKING · OPEN SCIENCE · S2 EP17 · ⚡ SWITCHED ON · SEASON 2 · REPLICATION CRISIS · PEER REVIEW · AI IN RESEARCH · PUBLICATION BIAS · P-HACKING · OPEN SCIENCE · S2 EP17 ·

Season 2 Episode 17 Science, Research & Trust

Thursday, June 26, 2026 · 13 min read

Trust the Science: The Replication Crisis and AI in Research

A significant proportion of published scientific findings cannot be reproduced. AI is being used to both expose this problem and, in some cases, make it worse. The credibility of science as a social institution has never mattered more, and the mechanisms protecting that credibility have never been under more pressure.

Neal LloydThursday, June 26, 2026

13 min read

Science is not a collection of facts. It is a process for producing provisional knowledge that is more reliable than the alternatives. The replication crisis is not evidence that science is broken. It is evidence that some of the incentive structures surrounding science are broken, and that fixing them matters enormously for the quality of the knowledge that policy, medicine, and technology development rely on.

— Switched On, Season 2 Episode 17

Yesterday we went into medicine — AI diagnostics outperforming radiologists on specific imaging tasks, AlphaFold's transformation of drug discovery timelines, robotic surgery's precision advantages, and the health equity question that the breathless innovation coverage consistently skips: AI medical tools trained on wealthy-country hospital data, deployed in wealthy-country hospitals, improving outcomes for patients in wealthy countries. Today we are one step further upstream, into the research ecosystem that produces the knowledge on which medicine, policy, and technology development rest. The replication crisis: the finding, now extensively documented across psychology, medicine, economics, and other fields, that a substantial fraction of published scientific results cannot be reproduced when independent researchers attempt to replicate them. And the role that AI is playing in this landscape — both as a tool for identifying and correcting problems, and as a new source of some of them.

01 — The Scale of the Problem

The replication crisis entered mainstream awareness primarily through psychology, following the Reproducibility Project: Psychology — a large-scale collaborative effort published in Science in 2015 that attempted to replicate one hundred published psychology studies. The results were sobering: only about thirty-six to thirty-nine percent of studies produced results consistent with the original findings, depending on the criterion used. Effect sizes in the replications were on average about half the size of the originals. The conclusion — that a majority of published psychology findings might not represent real, reproducible phenomena — generated significant academic controversy and significant public attention.

Subsequent replication efforts in other fields produced similarly uncomfortable results. A Bayer initiative to replicate sixty-seven oncology studies found that only twenty to twenty-five percent were reproduced with similar results. An Amgen effort replicating fifty-three landmark cancer biology studies found reproducible results in only six of them. The figures are not directly comparable across fields and methodologies, and the most alarming numbers have been contested. What is not seriously contested is that the published scientific literature contains a meaningful proportion of results that are either not reproducible, materially inflated in effect size, or both.

The replication crisis is not primarily a story about fraud. It is a story about incentives. Science funding, career advancement, and publication are all disproportionately rewarded by novel, statistically significant, positive findings. The incentive structure systematically selects for the publication of results that are interesting rather than results that are true.

02 — The Mechanisms: P-Hacking and Publication Bias

Two mechanisms — publication bias and p-hacking — account for much of the replication problem and are worth understanding in some detail because they are structural features of the research ecosystem rather than individual failures of integrity.

Publication bias is the well-documented tendency of journals to publish studies with statistically significant, positive findings and to reject studies with null or negative results. A drug that works is publishable. A drug that does not work is not. A psychological intervention that changes behaviour is publishable. One that does not change behaviour is not. The result is a published literature that systematically overstates the prevalence of positive, significant findings relative to the actual distribution of research results. Meta-analyses — studies that pool the results of multiple studies to estimate an overall effect — are systematically biased by the absence of the null results that were never published.

P-hacking — the practice of analysing data in multiple ways, trying multiple statistical models, removing outliers, adding covariates, and stopping data collection when statistical significance is reached — allows researchers to find statistically significant results in data that contains no real effect, simply through the mathematics of multiple comparisons. The conventional threshold for statistical significance (p less than 0.05) means that five percent of tests on null data will produce false positive results by chance. A researcher trying twenty different analysis approaches on one dataset has a sixty-four percent chance of finding at least one apparently significant result even if there is nothing real to find. This is not necessarily deliberate fraud — it reflects a widespread misunderstanding of what statistical significance means and an understandable desire to find publishable results.

03 — AI as a Tool for Research Integrity

AI is being applied to the replication crisis in several ways that are genuinely promising, representing one of the more straightforwardly beneficial applications of the technology in the research domain.

Automated statistical error detection — scanning published papers for statistical inconsistencies, impossible values, and signs of data manipulation — has been systematised through tools like Statcheck, which checks the internal consistency of statistical results reported in papers, and the GRIM and SPRITE tests, which check whether reported means and distributions are mathematically possible given reported sample sizes. These tools have identified statistical errors and inconsistencies in thousands of published papers, including in high-profile journals, producing corrections and retractions. AI-assisted versions of these checks, capable of scanning the entire published literature rather than individual papers, could substantially improve the quality of the literature over time.

Prediction markets for replication success — asking experts to bet on which studies will replicate before replication attempts are conducted — have proven surprisingly accurate at identifying likely non-reproducible findings. AI systems trained on features of papers associated with replication success and failure have achieved similar predictive accuracy, identifying potential red flags including unusually large effect sizes, small sample sizes, lack of pre-registration, and specific patterns in reported statistics.

04 — AI as a New Source of Research Problems

The same AI capabilities being used to improve research quality are simultaneously introducing new problems that the research integrity community is still grappling with.

Large language models used to assist with literature reviews, generate hypotheses, and draft research papers have been found to hallucinate citations — inventing plausible-sounding but nonexistent references, complete with author names, journal titles, and volume numbers. A 2023 study found that a significant proportion of AI-generated citations in papers submitted to scientific conferences were either partially or entirely fabricated. Researchers using LLM assistance without careful verification are inadvertently incorporating false references into their literature reviews, propagating errors through the citation network.

AI-assisted peer review — using language models to generate reviewer comments on submitted papers — has been documented as occurring without disclosure at several major journals, raising questions about the quality and authenticity of the review process. Peer review is already under severe strain from the volume of submissions relative to the supply of qualified reviewers willing to perform unpaid review labour. AI assistance in review is probably inevitable and not inherently problematic if disclosed and conducted responsibly. Undisclosed AI review that substitutes for genuine expert engagement is a different matter, and the norms around it are not yet settled.

05 — Open Science and the Structural Fix

The most durable response to the replication crisis is not technological but structural: changing the incentive system that rewards interesting findings over true ones. The open science movement — pre-registration of study designs before data collection, mandatory data sharing, registered reports that commit to publishing findings regardless of outcome, and open peer review — addresses the root causes rather than the symptoms.

Pre-registration — specifying in advance, in a public registry, exactly what hypotheses will be tested, what sample size will be collected, and what analyses will be conducted — makes p-hacking detectable because any deviation from the pre-registered plan is visible. Registered reports — a publication format in which journals commit to publishing a study based on the quality of the design before data collection begins, regardless of what the results turn out to be — directly address publication bias. Data sharing requirements enable independent verification and reanalysis. These reforms are being adopted by an increasing number of journals and funding bodies. They are not yet universal and the transition is slower than the evidence of the problem warrants.

AI has a role in accelerating this transition — automated pre-registration systems, AI-assisted data sharing with privacy protection, and computational tools for reproducibility checking all represent genuine contributions. But the fundamental reform required is cultural and institutional: a research ecosystem that values rigour over novelty, that publishes null results, that treats replication as the scientific achievement it is rather than the career risk it currently represents. Technology can support that shift. It cannot substitute for it.

Continued Tomorrow

Tomorrow we are moving from the integrity of knowledge production to the integrity of financial systems — specifically the world of decentralised finance, cryptocurrency's second act, and whether blockchain technology has found any applications that genuinely work outside of speculation and money laundering. See you then.

← Previous Episode

The Robot Doctor: AI in Medicine and the Future of Healthcare

Next Episode →

The Second Act: Crypto, DeFi, and What Blockchain Actually Does

⚡ About This Series

Switched On is a daily technology series covering the ideas, systems, and arguments shaping the digital world. Opinionated. Witty. Occasionally wrong. Always worth the argument.

Authored by Neal Lloyd · Published Daily