Understanding and Mitigating Poisoning Attacks in Large Language Models

Allika, Krishnakanth

doi:10.5281/zenodo.17494415

ScienceGate Book Chapters

JOURNAL ARTICLE

Understanding and Mitigating Poisoning Attacks in Large Language Models

Allika, Krishnakanth

Year: 2025 Journal: Zenodo (CERN European Organization for Nuclear Research) Publisher: European Organization for Nuclear Research

DOI: 10.5281/zenodo.17494415

Get Full-Text PDF Get Analytical Report

Abstract

This paper explores the growing threat of data poisoning and backdoor attacks in large language models (LLMs), revealing that even a small, fixed number of poisoned samples—around 250 documents—can compromise models up to 13B parameters. It synthesizes recent research, explains experimental methodologies from Anthropic and others, and provides actionable defense strategies for AI engineers and enterprises. The work emphasizes the urgent need for trusted data pipelines, anomaly detection, and post-training audits to ensure AI model integrity at scale.

Keywords:

Backdoor Compromise Audit Work (physics) Anomaly detection Poison control

Metrics

Cited By

0.00

FWCI (Field Weighted Citation Impact)

Refs

0.83

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Topics

Adversarial Robustness in Machine Learning

Physical Sciences → Computer Science → Artificial Intelligence

Explainable Artificial Intelligence (XAI)

Physical Sciences → Computer Science → Artificial Intelligence

Topic Modeling

Physical Sciences → Computer Science → Artificial Intelligence

Understanding and Mitigating Poisoning Attacks in Large Language Models

Abstract

Metrics

Topics

Related Documents

Understanding and Mitigating Poisoning Attacks in Large Language Models

Mitigating Poisoning Attacks on Machine Learning Models

Medical large language models are vulnerable to data-poisoning attacks

CleanGen: Mitigating Backdoor Attacks for Generation Tasks in Large Language Models

Poisoned Wells, Venomous Minds_ A Theoretical and Practical Framework for Detecting, Understanding, and Mitigating Data Poisoning Attacks on Large Language Models' Safety Guardrails