March 23, 2026

Artificial intelligence and large language models in drug safety

One thing I keep coming back to in drug safety is how much of the field still depends on reading. Reading labels. Reading case narratives. Reading papers. Reading safety sections over and over again, trying to turn messy text into something structured enough to analyze.

And the scale of that text is not small. FDA’s labeling resources now point to over 140,000 labeling documents,¹ and the broader FDALabel database contains more than 155,000 records across human, biologic, OTC, and animal products.² That is far beyond what any human team can realistically process in a fully manual way.

What makes AI exciting in pharmacovigilance is that it can help with the part we are all drowning in already: unstructured information. A lot of drug safety knowledge exists, but it is trapped inside documents.

Large language models are especially useful because they work with meaning, not just keywords.³ In practice, that matters a lot. Safety concepts are rarely written in exactly the same way across documents. A traditional rules-based approach may depend on fixed phrasing, while a transformer-based model is much better at recognizing that different expressions may still point to the same underlying safety issue.⁴

The FDA has developed a system called AskFDALabel that combines FDA labeling documents with large language models so users can query labels more naturally.⁵ This moves the task away from manually searching thousands of pages and toward actually interacting with the evidence. More importantly, this kind of work is not just about “asking questions to a chatbot.” In the 2025 Drug Safety paper on AskFDALabel, the authors describe using the framework to automate annotation, profiling, and classification of adverse events from FDA labels.⁶ According to the article highlighted in BioPharm International, the system reported very strong F1 scores for drug-induced liver injury and cardiotoxicity classification.⁷

This is where the real value starts to appear. Because in drug safety, the problem is often not a lack of data. It is a lack of usable data. We have labels, warnings, adverse reaction sections, boxed warnings, and clinical text, but turning that into something comparable across products is still hard. If LLM-based systems can help extract and standardize that information faster, they become useful very quickly.

There is also a more practical reason I find this promising: these systems fit well with how pharmacovigilance should work. The best use case is not “model says the answer, trust it.” The better use case is “model helps retrieve, organize, and structure the evidence, then a human reviewer checks it.” The trustworthiness paper on LLMs in regulatory environments makes exactly this point by emphasizing transparency and traceability.⁵

That human-in-the-loop part matters even more because this is still a high-risk domain. LLMs are impressive, but they are not automatically reliable. A model can produce fluent text that sounds completely convincing and still be wrong. That is a real issue in any medical setting, and it becomes even more serious in drug safety, where errors can affect signal detection, case interpretation, and regulatory decisions.⁸

If a model is reading from an authoritative source, pointing back to the exact section, and helping convert narrative text into structured outputs, that is useful. If it is simply improvising a polished answer, that is much less interesting. Retrieval-grounded approaches are one of the few ways LLMs start to look compatible with regulated workflows.⁵⁷

Another reason this area feels timely is that pharmacovigilance has already started moving beyond simple document search. There is now published work specifically on using large language models to extract drug safety information from prescription drug labels.⁹ This signals a broader shift: we are no longer only testing whether LLMs can summarize biomedical text. We are starting to ask whether they can support actual safety workflows.

For years, one of the most frustrating parts of working with safety information has been the gap between what exists in text and what you can actually analyze. If LLMs can narrow that gap — even imperfectly, but in a transparent and auditable way — they may end up being one of the most useful tools the field has adopted in a long time.⁶⁸

FDA. FDA’s Labeling Resources for Human Prescription Drugs. Updated March 15, 2026. https://www.fda.gov/drugs/laws-acts-and-rules/fdas-labeling-resources-human-prescription-drugs ↩
FDA. FDALabel: Full-Text Search of Drug Product Labeling. Updated March 12, 2026. https://www.fda.gov/science-research/bioinformatics-tools/fdalabel-full-text-search-drug-product-labeling ↩
Thirunavukarasu, A. J., Hassan, R., Mahmood, S., et al. Large language models in medicine. Nature Medicine 30 (2024). https://doi.org/10.1038/s41591-024-02813-6 ↩
Haug, C. J., Drazen, J. M. Artificial intelligence and machine learning in clinical medicine. New England Journal of Medicine 388 (2023): 1201–1208. ↩
Wu, L., Qu, Y., Xu, J., et al. A framework enabling LLMs into regulatory environment for transparency and trustworthiness and its application to drug labeling document. Regulatory Toxicology and Pharmacology 148 (2024): 105597. https://doi.org/10.1016/j.yrtph.2024.105597 ↩ ↩² ↩³
Wu, L., Fang, H., Qu, Y., et al. Leveraging FDA Labeling Documents and Large Language Model to Enhance Annotation, Profiling, and Classification of Drug Adverse Events with AskFDALabel. Drug Safety 48, no. 6 (2025): 655–665. https://doi.org/10.1007/s40264-025-01520-1 ↩ ↩²
Anbil, P. Transforming Drug Safety Through Artificial Intelligence and Large Language Models. BioPharm International. March 23, 2026. https://www.biopharminternational.com/view/drug-safety-artificial-intelligence-large-language-models ↩ ↩²
Hakim, J. B., Painter, J. L., Ramcharran, D., et al. The need for guardrails with large language models in pharmacovigilance and other medical safety critical settings. Scientific Reports 15 (2025): 27886. https://doi.org/10.1038/s41598-025-09138-0 ↩ ↩²
Gisladottir, U., Schotland, P., Callahan, A., et al. Leveraging large language models in extracting drug safety information from prescription drug labels. Drug Safety. https://doi.org/10.1007/s40264-025-01594-x ↩

Back to blog