Why Synthetic Science Threatens Epistemic Integrity, and What Ethical Analysis Reveals

By Mark T. Holcombe, AI Ethics Educator and Applied Ethics Framework Designer

The integrity of the scientific record is currently facing an unprecedented systemic threat. Recent disclosures indicate that researchers are increasingly utilizing generative artificial intelligence to produce “synthetic data”—datasets that perfectly mimic natural statistical noise and experimental anomalies. Unequivocally, synthetic data constitutes “FFP”: falsification, fabrication, or plagiarism. Wholesale fabrication of complete data sets customarily entails not just a retraction of the paper from publication but a lifetime ban on receiving federally funded grant monies and participating on federally funded research grant projects.

Because synthetic data fabrications are designed to bypass traditional peer-review detection mechanisms, they represent a fundamental rupture in the epistemic foundation of modern scientific inquiry. This crisis necessitates a rigorous ethical examination, moving beyond accusations of fraud to address the structural incentives that have rendered the scientific community vulnerable to such malfeasance.

The Foundational Dilemma: Epistemic Integrity vs. Institutional Output

At the core of this crisis lies a conflict between scientific duty and institutional systems. The foundational duty of science is to provide a reliable, truthful basis for public policy, medicine, and engineering. Researchers operate within a prevailing incentive structure of contemporary academia that enforces a “publish-or-perish” paradigm. This paradigm prioritizes high-volume output and prestige often tied to tenure decisions thereby creating a fertile environment for the adoption of synthetic data.

The stakeholders involved extend far beyond the individual researcher and the journal; they include the global scientific community, medical practitioners, policy makers, and the public who rely on the cumulative body of empirical knowledge. Applying the Holcombe Case-Based Moral Reasoning Framework (HCBMR),the primary ethical issue is the erosion of the “truth-claim” in scientific literature, that is, the fundamental scientific values of truth, transparency, and epistemic reliability. When the barrier to entry for publication is lowered by AI-generated simulations, the value of the scientific record as a public good is compromised. The recommendation is clear: the scientific community must transition toward “Data Provenance Protocols,” requiring immutable audit trails for all published datasets shifting the burden of proof away from the fallible peer-review process toward verifiable, raw data transparency.

Diagnosing the Disagreement: A Structural Trade-Off

To understand why this crisis persists, we must look to the Empirical Moral Reasoning Integration Model (EMRIM). The disagreement here is not over the morality of fraud, but over the prioritization of systemic pressures. The scientific community maintains a universal condemnation of data fraud, but significant disagreement persists regarding the systemic causes. Institutional administrators often prioritize procedural compliance and the deployment of detection software, whereas academic researchers frequently point to the hyper-competitive funding environment that incentivizes positive results at any cost.

When we apply the Moral Disagreement Diagnostic Model (MDDM), we classify this as a Trade-off Disagreement. These positions are rooted in conflicting moral foundations: the Authority foundation, which views the peer-review system as a sacred, self-correcting institution, and the Fairness foundation, which emphasizes the disadvantage honest researchers face when competing against those who manipulate the system. Additionally, researchers often operate under the influence of Loyalty—pressures to secure funding and maintain institutional status. In contrast, the public and medical communities prioritize the Care foundation, fearing the downstream harm of false medical data.

The scientific community is currently caught in a tension between the speed of output and the cost of verification. The crisis is not merely a failure of individual character; it is structural failure where the system incentivizes the appearance of productivity over the rigor of the scientific method.

Justice and Governance: Protecting the Public Good

From the perspective of Justice Without Politics (JWPR), the introduction of synthetic data constitutes a form of “epistemic theft.” The public, who relies on scientific consensus for health and safety, is burdened with the risk of false outcomes, while the perpetrator gains unearned academic capital. A just system would require that the cost of data verification be borne by the institutions benefiting from the research, rather than the public who suffers the consequences of the fraud.

The current system is inherently unjust because it allows researchers to capture the benefits of publication—prestige and funding—while externalizing the risks of fraudulent data onto the public. A fair arrangement would require that the burden of proof for data authenticity be shifted entirely to the author. The current “trust-based” model distributes the risk of harm disproportionately to the most vulnerable stakeholders, necessitating a shift toward a system where transparency is a prerequisite for participation.

Finally, the Applied Ethical Risk and Governance Framework (AERGF) recognizes that the current system of scientific governance and publication is currently ill-equipped for the era of generative AI. The scientific community must move toward a more robust governance model. This includes mandatory AI-watermarking for all generated datasets, institutional audits of raw laboratory data, and the implementation of strict reproducibility requirements as a non-negotiable condition for publication. Granted institutional audits may be pointless when entire data sets are fabricated wholesale unless some form of non-bypassable AI watermarking is implemented. Reproducibility may take years to complete after initial publication.

 

Implications for AI Ethics and Institutional Governance

The synthetic science problem has implications that extend well beyond academic publishing. For AI ethics, the issue demonstrates that the reliability of AI-assisted research depends not only on model performance but also on the governance structures that regulate evidentiary transparency and accountability. For institutional governance, the crisis illustrates how incentive systems can unintentionally reward outputs that undermine the epistemic objectives they were designed to advance. Organizations, funding agencies, journals, and research institutions must therefore evaluate whether existing oversight mechanisms remain adequate in an environment where synthetic content can closely imitate authentic empirical findings. The ethical challenge is not merely technological; it concerns the preservation of justified trust in knowledge-producing institutions. Consequently, policies focused on provenance, reproducibility, and verification should be viewed as governance requirements rather than optional best practices.

 

Socratic Reflection

As we confront this crisis, we must ask:

  • If an AI-generated dataset produces a result that is statistically

indistinguishable from reality and leads to a successful medical intervention, has the researcher committed a moral wrong, or have they merely optimized the process of discovery?

 

  • Can we expect individual researchers to maintain intellectual virtue in a system that structurally penalizes slow, rigorous inquiry?

 

The “Synthetic Science” crisis is a clarion call to re-evaluate the incentives that drive our

institutions. If we fail to prioritize epistemic integrity over institutional output, we risk the collapse of the very knowledge systems upon which modern civilization depends.

 

References

Haider, J., Söderström, K. R., Ekström, B., & Rödl, M. (2024). GPT-fabricated scientific papers on Google Scholar: Key features, spread, and implications for preempting evidence

manipulation. Harvard Kennedy School Misinformation Review. https://doi.org/10.37016/mr-2020-156

Resnik, D. B., & Hosseini, M. (2024). The ethics of using artificial intelligence in scientific research: new guidance needed for a new tool. AI and Ethics.

https://doi.org/10.1007/s43681-024-00493-8

 

Rye, Sam. (2025, November 26). AI misuse in research. Preprints.org Blog. https://www.preprints.org/blog/post/ai-misuse-research

 

FAQ

What is the central ethical issue in synthetic science?

The central ethical issue is the preservation of epistemic integrity, specifically whether scientific claims remain trustworthy when evidence can be artificially generated.

Why is synthetic data ethically significant?

Synthetic data may undermine confidence in scientific findings if researchers, reviewers, and institutions cannot reliably verify its origin and authenticity.

How does HCBMR analyze this issue?

HCBMR evaluates stakeholder obligations, competing values, and practical consequences to determine how institutional incentives affect ethical decision-making.

Is the problem primarily individual misconduct?

The analysis suggests that systemic incentives play a substantial role and should be examined alongside individual responsibility.

What governance reforms are most relevant?

Transparency requirements, reproducibility standards, data provenance systems, and institutional accountability mechanisms are among the most relevant reforms.