Digital transformation in Indian higher-education institutions is constrained not by the absence of information, but by the difficulty of accessing it across linguistic and structural boundaries. Administrative data such as admission rules, fee structures, examination schedules, and scholarship policies are published primarily in English and distributed across heterogeneous document formats, while students interact using Hindi, regional languages, and mixed Romanized scripts such as Hinglish. This paper presents an optimized Retrieval-Augmented Generation (RAG) architecture designed as a campus-scale natural language information system rather than a simple chatbot. The proposed framework integrates multilingual semantic embeddings, vector-based document retrieval, conversational state management, and grounded response generation into a unified, auditable architecture. A hybrid two-tier backend separates high-frequency user interaction from computationally intensive retrieval and inference, enabling scalable deployment across multiple institutions. Experimental evaluation demonstrates that the architectural design achieves high retrieval accuracy and low latency while preserving factual reliability, making it suitable for real-world administrative decision support in multilingual academic environments
Solmaz PourmahmoudMehrnoush Shamsfard
Wei LiuSony TrenousLeonardo F. R. RibeiroBill ByrneFelix Hieber