JOURNAL ARTICLE

Triplet Spatiotemporal Aggregation Network for Video Saliency Detection

Abstract

The effective aggregation of spatiotemporal information to accommodate real-world complex scenes is a fundamental issue in video saliency detection. In this paper, we propose a Triplet Spatiotemporal Aggregation Network (TSAN) to address it from the aggregation of spatiotemporal interaction, spatiotemporal information distribution, and multi-level spatiotemporal features. Firstly, we propose an interactive aggregation gate (IAG) module to model spatial and temporal global context information and perform inter-modal information transfer. Secondly, we employ an information distribution consistency (IDC) module to enhance the consistency of spatiotemporal representation by maximizing the correlation of spatiotemporal high-level features. Finally, we design a multi-level spatiotemporal feature aggregation (MSF) framework to merge cross-level and cross-modal features. These three modules are combined into a unified framework to jointly optimize spatiotemporal information for more precise results. Experimental results on five prevailing datasets show that TSAN outperforms previous competitors.

Keywords:
Computer science Merge (version control) Consistency (knowledge bases) Modal Artificial intelligence Representation (politics) Context (archaeology) Data mining Pattern recognition (psychology) Information retrieval Geography

Metrics

1
Cited By
0.18
FWCI (Field Weighted Citation Impact)
26
Refs
0.42
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Visual Attention and Saliency Detection
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Olfactory and Sensory Function Studies
Life Sciences →  Neuroscience →  Sensory Systems
Image and Video Quality Assessment
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
© 2026 ScienceGate Book Chapters — All rights reserved.