Birds have four legs?! NumerSense: Probing Numerical Commonsense Knowledge of Pre-Trained Language Models

Bill Yuchen Lin; Seyeon Lee; Rahul Khanna; Xiang Ren

doi:10.18653/v1/2020.emnlp-main.557

ScienceGate Book Chapters

JOURNAL ARTICLE

Birds have four legs?! NumerSense: Probing Numerical Commonsense Knowledge of Pre-Trained Language Models

Bill Yuchen Lin Seyeon Lee Rahul Khanna Xiang Ren

Year: 2020

DOI: 10.18653/v1/2020.emnlp-main.557

Get Full-Text PDF Get Analytical Report

Abstract

Recent works show that pre-trained language models (PTLMs), such as BERT, possess certain commonsense and factual knowledge. They suggest that it is promising to use PTLMs as “neural knowledge bases” via predicting masked words. Surprisingly, we find that this may not work for numerical commonsense knowledge (e.g., a bird usually has two legs). In this paper, we investigate whether and to what extent we can induce numerical commonsense knowledge from PTLMs as well as the robustness of this process. In this paper, we investigate whether and to what extent we can induce numerical commonsense knowledge from PTLMs as well as the robustness of this process. To study this, we introduce a novel probing task with a diagnostic dataset, NumerSense, containing 13.6k masked-word-prediction probes (10.5k for fine-tuning and 3.1k for testing). Our analysis reveals that: (1) BERT and its stronger variant RoBERTa perform poorly on the diagnostic dataset prior to any fine-tuning; (2) fine-tuning with distant supervision brings some improvement; (3) the best supervised model still performs poorly as compared to human performance (54.06% vs. 96.3% in accuracy).

Keywords:

Commonsense knowledge Robustness (evolution) Computer science Artificial intelligence Natural language processing Task (project management) Word-sense disambiguation Language model Deep neural networks Commonsense reasoning Machine learning Artificial neural network Process (computing) Knowledge extraction

Metrics

114

Cited By

14.10

FWCI (Field Weighted Citation Impact)

Refs

0.99

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Topic Modeling

Physical Sciences → Computer Science → Artificial Intelligence

Natural Language Processing Techniques

Physical Sciences → Computer Science → Artificial Intelligence

Multimodal Machine Learning Applications

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Birds have four legs?! NumerSense: Probing Numerical Commonsense Knowledge of Pre-Trained Language Models

Abstract

Metrics

Citation History

Topics

Related Documents

Commonsense Knowledge Transfer for Pre-trained Language Models

GeoMLAMA: Geo-Diverse Commonsense Probing on Multilingual Pre-Trained Language Models

Probing Pre-Trained Language Models for Disease Knowledge

Probing Simile Knowledge from Pre-trained Language Models

Evaluating Commonsense in Pre-Trained Language Models