Eshika JainPratham KaushikVinay KukrejaVandana AhujaAyush DograAnkit Bansal
Introduction/Background: COVID-19 remains a public health emergency, necessitating rapid and accurate diagnostic techniques. Chest X-ray imaging is a low-cost, widely used technique for the detection of COVID-19, but its interpretation by humans is laborious and prone to errors. In this study, we propose an automated detection of COVID-19 on chest X-rays using a Pyramid Vision Transformer (PVT) model with self-supervised learning (SSL), pre-training, and attention map visualization. The proposed method has the potential to be more accurate, interpretable, and efficient and thus clinically suitable. Materials and Methods: In this study, the COVID-19 Chest X-Ray Database on Kaggle, which comprises 36,116 images classified as normal, viral pneumonia, and COVID-19, was employed. Largescale preprocessing operations, including the resizing, normalizing, and data augmentation operations, were carried out to generalize these models. Pretraining and fine-tuning the PVT model on the dataset included SSL, dropout regularization, and attention mechanisms. The primary metrics considered during the evaluation were the Measurement of Lung Severity Score (LSS), segmentation accuracy, Severity Detection Precision (SDP), Detection Sensitivity of Opacity (ODS), Time-to-Severity Detection (TSD), and focal AUC-ROC score. Results: Fine-tuning the PVT model significantly improved performance across multiple metrics. LSS increased from 15% (pretrained) to 17% (fine-tuned), while segmentation accuracy improved from 88% to 91%. Dropout regularization slightly reduced LSS to 16% but enhanced SDP (80% to 90%) and ODS (78% to 85%). TSD decreased from 4.5s (pretrained) to 3.8s (fine-tuned), improving detection speed. The focal AUC-ROC score improved from 0.92 to 0.95 with fine-tuning and dropout, while the Misclassification Visualization Score (MVS) increased from 0.85 to 0.91, reducing misclassification rates. Data augmentation further enhanced accuracy (88% to 94%), precision (85% to 91%), and recall (83% to 90%). Discussion: This study demonstrates the effectiveness of SSL pretraining, dropout regularization, and data augmentation in improving COVID-19 detection performance. The significant improvements in precision, recall, and robustness highlight the model';s potential for clinical deployment. Attention map visualizations further enhance trust and interpretability by illustrating key lung regions that the model focuses on, ensuring transparency in decision-making. Conclusion: The PVT-based model, integrated with SSL, fine-tuning, and attention mechanisms, provides a robust, interpretable, and efficient solution for COVID-19 detection from chest X-ray images. The results validate its potential for real-world clinical use, offering improved diagnostic accuracy, reduced misclassification, and enhanced detection speed.
Guang LiRen TogoTakahiro OgawaRen Togo
Pranab SahooSriparna SahaSamrat MondalSujit ChowdhurySuraj Gowda
I. FékiSourour AmmarYousri Kessentini
Mustafa YurdakulŞakir Taşdemir