JOURNAL ARTICLE

Quootstrap: Scalable Unsupervised Extraction of Quotation-Speaker Pairs from Large News Corpora via Bootstrapping

Dario PavlloTiziano PiccardiRobert West

Year: 2018 Journal:   arXiv (Cornell University) Pages: 231-240   Publisher: Cornell University

Abstract

We propose Quootstrap, a method for extracting quotations, as well as the names of the speakers who uttered them, from large news corpora. Whereas prior work has addressed this problem primarily with supervised machine learning, our approach follows a fully unsupervised bootstrapping paradigm. It leverages the redundancy present in large news corpora, more precisely, the fact that the same quotation often appears across multiple news articles in slightly different contexts. Starting from a few seed patterns, such as ["Q", said S.], our method extracts a set of quotation-speaker pairs (Q, S), which are in turn used for discovering new patterns expressing the same quotations; the process is then repeated with the larger pattern set. Our algorithm is highly scalable, which we demonstrate by running it on the large ICWSM 2011 Spinn3r corpus. Validating our results against a crowdsourced ground truth, we obtain 90% precision at 40% recall using a single seed pattern, with significantly higher recall values for more frequently reported (and thus likely more interesting) quotations. Finally, we showcase the usefulness of our algorithm's output for computational social science by analyzing the sentiment expressed in our extracted quotations.

Keywords:
Computer science Bootstrapping (finance) Artificial intelligence Scalability Redundancy (engineering) Natural language processing Ground truth Set (abstract data type) Recall Precision and recall Machine learning Mathematics

Metrics

2
Cited By
0.00
FWCI (Field Weighted Citation Impact)
0
Refs
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Topic Modeling
Physical Sciences →  Computer Science →  Artificial Intelligence
Natural Language Processing Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence
Advanced Text Analysis Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence

Related Documents

JOURNAL ARTICLE

Quootstrap: Scalable Unsupervised Extraction of Quotation-Speaker Pairs from Large News Corpora via Bootstrapping

Dario PavlloTiziano PiccardiRobert West

Journal:   Proceedings of the International AAAI Conference on Web and Social Media Year: 2018 Vol: 12 (1)
JOURNAL ARTICLE

High-Performance Unsupervised Relation Extraction from Large Corpora

Binjamin RozenfeldRonen Feldman

Journal:   Proceedings Year: 2006 Pages: 1032-1037
JOURNAL ARTICLE

Bootstrapping for extracting relations from large corpora

Li WeigangTing LiuSheng Li

Journal:   Journal of Electronics (China) Year: 2008 Vol: 25 (1)Pages: 89-96
BOOK-CHAPTER

Robust Bootstrapping of Speaker Models for Unsupervised Speaker Indexing

Zhonghua Fu

Lecture notes in computer science Year: 2007 Pages: 122-129
JOURNAL ARTICLE

Quotation Extraction from Indonesian Online News

Achmad Choirudin EmchaWidyawan WidyawanTeguh Bharata Adji

Journal:   2019 International Conference on Information and Communications Technology (ICOIACT) Year: 2019 Pages: 408-412
© 2026 ScienceGate Book Chapters — All rights reserved.