Dhankar, Ms.UshaKalra, Ms. PreetiSamanotra, Ms.AgrimaShriv Astava, Mr.Aaditya
This research explores the application of Retrieval-Augmented Generation (RAG) for enhancing informationextraction and question-answering tasks from scanned PDF documents using Optical Character Recognition (OCR). Byintegrating a retrieval mechanism with a generative language model, we present a novel framework that intelligently interpretsnoisy, unstructured OCR outputs and enables contextual interaction via natural language queries[1][2]. The approach bridgesthe gap between image-based document archives and intelligent systems, facilitating improved document accessibility in fieldslike legal, academic, and archival research.
Dhankar, Ms.UshaKalra, Ms. PreetiSamanotra, Ms.AgrimaShriv Astava, Mr.Aaditya
Bo LiuYishuang NingSheng HeFei GuoSiyu JiaLi Zhu
Wei TuYuanyuan LiMan LiYuanhao Qiu
Pluempiti YookasameThitiporn PramounSrisupang Thewsuwan