JOURNAL ARTICLE

Applying Topic Segmentation to Document-Level Information Retrieval

Abstract

In the present paper we discuss how text segmentation could be applied in the information retrieval domain. We assume that topic text segmentation allows one to better model text structure and therefore language itself, which influences the quality of text representation. We test the initial hypothesis by conducting experiments with several baseline models on the arXiv dataset comparing their quality on whole texts and on segmented texts. The experiments demonstrated that, indeed, the quality of retrieval is generally slightly improved.

Keywords:
Computer science Segmentation Information retrieval Natural language processing Quality (philosophy) Representation (politics) Artificial intelligence Domain (mathematical analysis) Baseline (sea) Document Structure Description Mathematics World Wide Web XML

Metrics

13
Cited By
0.60
FWCI (Field Weighted Citation Impact)
19
Refs
0.74
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Topic Modeling
Physical Sciences →  Computer Science →  Artificial Intelligence
Advanced Text Analysis Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence
Natural Language Processing Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence

Related Documents

JOURNAL ARTICLE

Sub-document level information retrieval

Sukomal Pal

Journal:   ACM SIGIR Forum Year: 2012 Vol: 47 (1)Pages: 65-66
JOURNAL ARTICLE

Information Retrieval at Sub‐document Level

Geoffrey Squires

Journal:   British Journal of Educational Technology Year: 1971 Vol: 2 (3)Pages: 211-215
JOURNAL ARTICLE

Multi-document topic segmentation

Minwoo JeongIvan Titov

Year: 2010 Pages: 1119-1128
© 2026 ScienceGate Book Chapters — All rights reserved.