Human Activity Recognition (HAR) has gained significant attention in recent years due to its wide-ranging applications. This paper introduces a novel hybrid visual transformer methodology designed to enhance the robust analysis and comprehension of activities. CVTN (Convolution Visual Transformer Network) leverages sensor data represented jointly in spatial and temporal dimensions to enhance the resilience of the HAR process. The proposed technique employs a hybrid model that integrates Convolutional Neural Networks (CNNs) and Visual Transformers (VTs). Initially, the CNN component learns spatial visual features from diverse sensor data. Subsequently, these acquired visual features are inputted into the transformer segment of the model. VT captures temporal insights by observing sensor statuses across different time points. The efficacy of the CVTN methodology is assessed using the Kinetics dataset, which emulates real-world human activity recognition scenarios. The experimental results reveal clear superiority compared to the recent baseline HAR solutions, reaffirming its potential for advancing activity analysis.
Asif IqbalMuhammad Arslan RaufSalim SalimMosleh MahamudMian Muhammad Yasir KhalilZhen Qin
Oumaima SaidaniMajed AlsafyaniRoobaea AlroobaeaNazik AlturkiRashid JahangirLeila Jamel
Oumaima SaidaniMajed AlsafyaniRoobaea AlroobaeaNazik AlturkiRashid JahangirLeila Jamel Menzli
Oumaima SaidaniMajed AlsafyaniRoobaea AlroobaeaNazik AlturkiRashid JahangirLeila Jamel Menzli
Deep Narayan MauryaDeepak AroraChandan Pal Singh