This article describes a method for automatically extracting information from electricity invoices. This type of documents contains rich information about the billing of each supply point and data about the customer, the contract, or the electricity company. In this work, we train a neural network to classify the input data among eighty-six different labels. We use the IDSEM dataset that contains 75.000 electricity invoices of the Spanish electricity market in PDF format. Each document is converted into text format and the classification is carried out through a named entity recognition (NER) process. The underlying neural network used in the process is a Transformer. The results demonstrate that the proposed method correctly classifies the majority of the labels with high accuracy. Furthermore, the method exhibits robustness in handling invoices with different layouts and contents, highlighting its versatility and reliability.
Ali MovagharMansour JamzadHossein Asadi
Tongliang LiuGeoff WebbLin YueDadong Wang
Roberto BasiliDomenico LemboCarla LimongelliAndrea Orlandini
Tongliang LiuGeoff WebbLin YueDadong Wang