JOURNAL ARTICLE

Acoustic-Based Train Arrival Detection Using Convolutional Neural Networks With Attention

Van-Thuan TranWei-Ho Tsai

Year: 2022 Journal:   IEEE Access Vol: 10 Pages: 72120-72131   Publisher: Institute of Electrical and Electronics Engineers

Abstract

In the places of railroad crossings, audible warning signals such as train whistles and railway alarms are utilized to warn the road users of paying attention and giving priority to the approaching train(s). However, road users may sometimes be unaware of warning signals due to various reasons, resulting in inappropriate cooperation or even traffic collision between railway vehicles and non-railway vehicles. This work studies deep learning-based approaches to develop systems for acoustic-based train arrival detection (A-TAD). Firstly, we develop a novel audio dataset of train horns, railway alarms, railway noise, and other urban noises to conduct A-TAD experiments. We then examine the efficiency of handcrafted acoustic features (i.e. MFCC and Mel-spectrogram) in building A-TAD’s audio classifier, the MSNet, which is based on two-dimensional convolutional neural networks (2D-CNN). Next, we propose to apply the attention mechanism and utilize MFCC and spectrogram simultaneously to enhance the classification accuracy, in which the combined use of acoustic features is considered at the input level (with InCom-TADNet), high-level feature level (with FCCom-TADNet), and decision level (with DLCom-TADNet). Our experiments have shown the efficiency of MSNet and attention mechanism as the MSNet trained with the single feature is more performant than the baseline models and applying attention modules results in better accuracies. Also, the combined use of MFCC and spectrogram significantly improve the system’s accuracy and robustness. A-TAD systems can be utilized to extend the safety function of the railway crossing systems, private cars, and self-driving cars, and particularly be useful for hearing-impaired road users.

Keywords:
Computer science Convolutional neural network Speech recognition Artificial intelligence Real-time computing Pattern recognition (psychology)

Metrics

13
Cited By
2.53
FWCI (Field Weighted Citation Impact)
38
Refs
0.86
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Music and Audio Processing
Physical Sciences →  Computer Science →  Signal Processing
Speech Recognition and Synthesis
Physical Sciences →  Computer Science →  Artificial Intelligence
Speech and Audio Processing
Physical Sciences →  Computer Science →  Signal Processing
© 2026 ScienceGate Book Chapters — All rights reserved.