JOURNAL ARTICLE

Improving classification of tweets using word-word co-occurrence information from a large external corpus

Abstract

Classifying tweets is an intrinsically hard task as tweets are\nshort messages which makes traditional bags of words based\napproach ine cient. In fact, bags of words approaches ig-\nnores relationships between important terms that do not\nco-occur literally.\nIn this paper we resort to word-word co-occurence informa-\ntion from a large corpus to expand the vocabulary of another\ncorpus consisting of tweets. Our results show that we are\nable to reduce the number of erroneous classi cations by\n14% using co-occurence information.

Keywords:
Word (group theory) Computer science Task (project management) Natural language processing Artificial intelligence Information retrieval Linguistics Engineering

Metrics

3
Cited By
0.85
FWCI (Field Weighted Citation Impact)
20
Refs
0.89
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Topic Modeling
Physical Sciences →  Computer Science →  Artificial Intelligence
Advanced Text Analysis Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence
Text and Document Classification Technologies
Physical Sciences →  Computer Science →  Artificial Intelligence

Related Documents

BOOK-CHAPTER

Improving Classification of Tweets Using Linguistic Information from a Large External Corpus

Hugo L. HammerAnis YazidiAleksander BaiPaal Engelstad

Lecture notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering Year: 2017 Pages: 122-134
JOURNAL ARTICLE

Word classification and hierarchy using co-occurrence word information

Kazuhiro MoritaEl‐Sayed AtlamMasao FuketraKazuhiko TsudaMasaki OonoJun‐ichi Aoe

Journal:   Information Processing & Management Year: 2003 Vol: 40 (6)Pages: 957-972
JOURNAL ARTICLE

Semantic Hashtag Relation Classification Using Co-occurrence Word Information

Sungwon SeoJong‐Kook KimSung-Il KimJeewoo KimJoongheon Kim

Journal:   Wireless Personal Communications Year: 2018 Vol: 107 (3)Pages: 1355-1365
BOOK-CHAPTER

Improving Implicit Stance Classification in Tweets Using Word and Sentence Embeddings

Robin SchaeferManfred Stede

Lecture notes in computer science Year: 2019 Pages: 299-307
© 2026 ScienceGate Book Chapters — All rights reserved.