MFinBERT: Multilingual Pretrained Language Model For Financial Domain

Duong Nguyen; Nam Cao; Son Nguyen; Son Ta; Cuong Dinh

doi:10.1109/kse56063.2022.9953749

ScienceGate Book Chapters

JOURNAL ARTICLE

MFinBERT: Multilingual Pretrained Language Model For Financial Domain

Duong Nguyen Nam Cao Son Nguyen Son Ta Cuong Dinh

Year: 2022 Pages: 1-6

DOI: 10.1109/kse56063.2022.9953749

Get Full-Text PDF Get Analytical Report

Abstract

There has been an increasing demand for good semantic representations of text in the financial sector when solving natural language processing tasks in Fintech. Previous work has shown that widely used modern language models trained in the general domain often perform poorly in this particular domain. There have been attempts to overcome this limitation by introducing domain-specific language models learned from financial text. However, these approaches suffer from the lack of in-domain data, which is further exacerbated for languages other than English. These problems motivate us to develop a simple and efficient pipeline to extract large amounts of financial text from large-scale multilingual corpora such as OSCAR and C4. We conduct extensive experiments with various downstream tasks in three different languages to demonstrate the effectiveness of our approach across a wide range of standard benchmarks.

Keywords:

Computer science Pipeline (software) Natural language processing Domain (mathematical analysis) Artificial intelligence Language model Scale (ratio) Natural language Range (aeronautics) Simple (philosophy) Programming language

Metrics

Cited By

0.39

FWCI (Field Weighted Citation Impact)

Refs

0.62

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Topic Modeling

Physical Sciences → Computer Science → Artificial Intelligence

Stock Market Forecasting Methods

Social Sciences → Decision Sciences → Management Science and Operations Research

Natural Language Processing Techniques

Physical Sciences → Computer Science → Artificial Intelligence

MFinBERT: Multilingual Pretrained Language Model For Financial Domain

Abstract

Metrics

Citation History

Topics

Related Documents

Controllable Abstractive Summarization Using Multilingual Pretrained Language Model

Pretrained multilingual Party model

Pretrained multilingual Party model

Distilling a Pretrained Language Model to a Multilingual ASR Model

Pretrained multilingual Party model