JOURNAL ARTICLE

Thai Nested Named Entity Recognition Corpus

Abstract

This paper presents the first Thai Nested Named Entity Recognition (N-NER) dataset. Thai N-NER consists of 264,798 mentions, 104 classes, and a maximum depth of 8 layers obtained from 4,894 documents in the domains of news articles and restaurant reviews. Our work, to the best of our knowledge, presents the largest non-English N-NER dataset and the first non-English one with fine-grained classes. To understand the new challenges our proposed dataset brings to the field, we conduct an experimental study on (i) cutting edge N-NER models with the state-of-the-art accuracy in English and (ii) baseline methods based on well-known language model architectures. From the experimental results, we obtained two key findings. First, all models produced poor F1 scores in the tail region of the class distribution. There is little or no performance improvement provided by these models with respect to the baseline methods with our Thai dataset. These findings suggest that further investigation is required to make a multilingual N-NER solution that works well across different languages.

Keywords:
Named-entity recognition Computer science Baseline (sea) Natural language processing Artificial intelligence Enhanced Data Rates for GSM Evolution Field (mathematics) Class (philosophy) Entity linking Information retrieval Mathematics Knowledge base Engineering

Metrics

5
Cited By
0.59
FWCI (Field Weighted Citation Impact)
22
Refs
0.63
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Topic Modeling
Physical Sciences →  Computer Science →  Artificial Intelligence
Natural Language Processing Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence
Text and Document Classification Technologies
Physical Sciences →  Computer Science →  Artificial Intelligence

Related Documents

JOURNAL ARTICLE

Nested named entity recognition

Jenny Rose FinkelChristopher D. Manning

Year: 2009 Vol: 1 Pages: 141-141
JOURNAL ARTICLE

Nested Biomedical Named Entity Recognition

Lobna MadyYasmine M. AfifyNagwa Badr

Journal:   International journal of intelligent computing and information sciences/International Journal of Intelligent Computing and Information Sciences Year: 2022 Vol: 22 (1)Pages: 98-107
JOURNAL ARTICLE

KONNE: A Korean Nested Named Entity Corpus

Yu - Nam CheongYoung - Sook SongHyun - Jo You

Journal:   Journal of Korean Linguistics Year: 2023 Vol: 105 Pages: 309-344
© 2026 ScienceGate Book Chapters — All rights reserved.