Existing state-of-the-art methods for open-domain question-answering (ODQA) use an open book approach in which information is first retrieved from a large text corpus or knowledge base (KB) and then reasoned over to produce an answer. A recent alternative is to retrieve from a collection of previously-generated question-answer pairs; this has several practical advantages including being more memory and compute-efficient. Question-answer pairs are also appealing in that they can be viewed as an intermediate between text and KB triples: like KB triples, they often concisely express a single relationship, but like text, have much higher coverage than traditional KBs. In this work, we describe a new QA system that augments a text-to-text model with a large memory of question-answer pairs, and a new pre-training task for the latent step of question retrieval. The pre-training task substantially simplifies training and greatly improves performance on smaller QA benchmarks. Unlike prior systems of this sort, our QA system can also answer multi-hop questions that do not explicitly appear in the collection of stored question-answer pairs.
Tieke HeLi YuZhipeng ZouQing Wu
Xinyi LiangRui HuYu LiuKonglin Zhu
Gia-Nghia TranDuc-Tuan LuuDang Van Thin
Chia-Chih KuoKuan-Yu ChenShang-Bao Luo
Jinfeng XiaoLidan WangFranck DernoncourtTrung BuiTong SunJiawei Han