JOURNAL ARTICLE

Big Data Full-Text Search Index Minimization Using Text Summarization

Abstract

An efficient full-text search is achieved by indexing the raw data with an additional 20 to 30 percent storagecost. In the context of Big Data, this additional storage space is huge and introduces challenges to entertainfull-text search queries with good performance. It also incurs overhead to store, manage, and update the largesize index. In this paper, we propose and evaluate a method to minimize the index size to offer full-text searchover Big Data using an automatic extractive-based text summarization method. To evaluate the effectivenessof the proposed approach, we used two real-world datasets. We indexed actual and summarized datasets usingApache Lucene and studied average simple overlapping, Spearman's rho correlation, and average rankingscore measures of search results obtained using different search queries. Our experimental evaluation showsthat automatic text summarization is an effective method to reduce the index size significantly. We obtained amaximum of 82% reduction in index size with 42% higher relevance of the search results using the proposedsolution to minimize the full-text index size.

Keywords:
Automatic summarization Search engine indexing Big data Index (typography) Context (archaeology) Relevance (law) Raw data Overhead (engineering)

Metrics

0
Cited By
0.00
FWCI (Field Weighted Citation Impact)
0
Refs
0.44
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Topics

Information Retrieval and Search Behavior
Physical Sciences →  Computer Science →  Information Systems
Data Quality and Management
Social Sciences →  Decision Sciences →  Management Science and Operations Research
Advanced Text Analysis Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence
© 2026 ScienceGate Book Chapters — All rights reserved.