JOURNAL ARTICLE

Retrieval-Augmented Generation for Intelligent Question Answering from OCR-Processed PDFs

Dhankar, Ms.UshaKalra, Ms. PreetiSamanotra, Ms.AgrimaShriv Astava, Mr.Aaditya

Year: 2025 Journal:   Zenodo (CERN European Organization for Nuclear Research)   Publisher: European Organization for Nuclear Research

Abstract

This research explores the application of Retrieval-Augmented Generation (RAG) for enhancing informationextraction and question-answering tasks from scanned PDF documents using Optical Character Recognition (OCR). Byintegrating a retrieval mechanism with a generative language model, we present a novel framework that intelligently interpretsnoisy, unstructured OCR outputs and enables contextual interaction via natural language queries[1][2]. The approach bridgesthe gap between image-based document archives and intelligent systems, facilitating improved document accessibility in fieldslike legal, academic, and archival research.

Keywords:
Question answering Generative grammar Natural language Character (mathematics) Optical character recognition Natural language generation Document retrieval Natural (archaeology)

Metrics

0
Cited By
0.00
FWCI (Field Weighted Citation Impact)
0
Refs
0.50
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Topics

Handwritten Text Recognition Techniques
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Multimodal Machine Learning Applications
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Topic Modeling
Physical Sciences →  Computer Science →  Artificial Intelligence
© 2026 ScienceGate Book Chapters — All rights reserved.