Xu ZhaoGuozhong WangYingdong Lu
With the accelerated digital transformation in education, the efficient integration of massive multimodal instructional resources and the support for interactive question answering (QA) remains a prominent challenge. This study introduces Multimodal Disciplinary Knowledge-Augmented Generation (MDKAG), a framework integrating retrieval-augmented generation (RAG) with a multimodal disciplinary knowledge graph (MDKG). MDKAG first extracts high-precision entities from digital textbooks, lecture slides, and classroom videos by using the Enhanced Representation through Knowledge Integration 3.0 (ERNIE 3.0) model and then links them into a graph that supports fine-grained retrieval. At inference time, the framework retrieves graph-adjacent passages, integrates multimodal data, and feeds them into a large language model (LLM) to generate context-aligned answers. An answer-verification module checks semantic overlap and entity coverage to filter hallucinations and triggers incremental graph updates when new concepts appear. Experiments on three university courses show that MDKAG reduces hallucination rates by up to 23% and increases answer accuracy by 11% over text-only RAG and knowledge-augmented generation (KAG) baselines, demonstrating strong adaptability across subject domains. The results indicate that MDKAG offers an effective route for scalable knowledge organization and reliable interactive QA in education.
Shue-Kei HowLee-Yeng OngMeng-Chew Leow
Xiangrong ZhuYuexiang XieYi LiuYuanman LiWei Hu
C. Anil CogalanAyca Kumluca Topalli
Zafar AliYi HuangAsad KhanGuilin QiYuxin ZhangJunlan FengChao DengPavlos Kefalas
Hongda WangQi MiaoZheng WangQiliang Yang