JOURNAL ARTICLE

Natural Language Processing for Lexical Corpus Analysis

Abram Handler

Year: 2021 Journal:   Scholarworks (University of Massachusetts Amherst)   Publisher: University of Massachusetts Amherst

Abstract

People have been analyzing documents by reading keywords in context for centuries. Traditional approaches like paper concordances or digital keyword-in-context viewers display all occurrences of a single word from a corpus vocabulary amid immediately surrounding tokens or characters, to show readers how individual lexical items are used in bodies of text. We propose that these common tools are one particular application of a more general approach to analyzing documents, which we define as lexical corpus analysis. We then propose new natural language processing techniques for lexically-focused corpus investigation, and demonstrate how such methods can be used to create new user-facing tools for analyzing corpora. Our contributions are divided into three parts. In Part 1, we consider how to represent a corpus lexicon to best reflect human mental and linguistic models of a domain, and propose a natural language processing (NLP) method for enriching a unigram corpus vocabulary with multiword phases. In Part 2, we consider how lexical systems might show query terms in context to best satisfy user search need, and offer several new techniques focused on summarizing mentions of a query term in context. Finally, in Part 3, we apply our proposed NLP methods towards new user-facing systems for lexical corpus analysis, and present user studies with journalists and historians which investigate how new lexical tools can help such users in their work.

Keywords:
Natural language processing Linguistics Computer science Artificial intelligence Corpus linguistics Natural language Natural (archaeology) History

Metrics

0
Cited By
0.00
FWCI (Field Weighted Citation Impact)
0
Refs
0.19
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Topics

Natural Language Processing Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence
Topic Modeling
Physical Sciences →  Computer Science →  Artificial Intelligence

Related Documents

BOOK-CHAPTER

Lexical Issues in Natural Language Processing

Ted Briscoe

Year: 1991 Pages: 39-68
BOOK

Natural Language Processing for Corpus Linguistics

Jonathan Dunn

Cambridge University Press eBooks Year: 2022
JOURNAL ARTICLE

Lexical knowledge representation and natural language processing

James PustejovskyBranimir Boguraev

Journal:   Artificial Intelligence Year: 1993 Vol: 63 (1-2)Pages: 193-223
JOURNAL ARTICLE

Lexical Ambiguity in Natural Language Processing Applications

N. Sree HarshaCh. Nageswar KumarVijaya Krishna SonthiK Amarendra

Journal:   2022 International Conference on Electronics and Renewable Systems (ICEARS) Year: 2022 Vol: 2 Pages: 1550-1555
© 2026 ScienceGate Book Chapters — All rights reserved.