This study introduces a novel Facial Emotion Recognition (FER) algorithm designed for uncontrolled, or "in the wild," environments. Traditional image-based FER systems face challenges in these settings due to variations in pose, occlusion, lighting, and skin tones. Our approach overcomes these limitations by extracting and utilizing 3D facial landmarks instead of relying directly on image-based features. We employ the MediaPipe FaceMesh model to extract 478 normalized facial landmarks from the FER+ dataset, which includes images labeled with eight emotions. These landmarks define 2556 face tessellations, serving as embedding features for a transformer-based network. The algorithm's unique aspect lies in its ability to normalize images across different conditions and cameras, using a consistent set of features derived from 3D landmarks. This normalization enables the integration of diverse FER datasets for training, enhancing the algorithm's applicability across various devices. Achieving 73.7% accuracy on the FER+ dataset, the algorithm demonstrates significant promise in emotion classification. Its adaptive attention mechanism focuses on well-represented landmarks, accounting for the face's angle and aspects, rather than solely on the emotion itself. This study marks a significant advancement in FER technology, offering a robust solution for emotion recognition in real-world and varied conditions.
Qixuan ZhangZhifeng WangYang LiuZhenyue QinKaihao ZhangTom Gedeon
Siyi MoWenming YangGuijin WangQingmin Liao
Abdulrahman AlreshidiMohib Ullah
Abdulaziz Salamah AljoloudHabib UllahAdwan Alownie
Farhad RahdariEsmat RashediMahdi Eftekhari