Hao ZhangHaoran LiangXing ZhaoJian LiuRonghua Liang
Abstract In the realm of video salient object detection (VSOD), the majority of research has traditionally been centered on third‐person perspective videos. However, this focus overlooks the unique requirements of certain first‐person tasks, such as autonomous driving or robot vision. To bridge this gap, a novel dataset and a camera‐based VSOD model, CaMSD , specifically designed for egocentric videos, is introduced. First, the SalEgo dataset, comprising 17,400 fully annotated frames for video salient object detection, is presented. Second, a computational model that incorporates a camera movement module is proposed, designed to emulate the patterns observed when humans view videos. Additionally, to achieve precise segmentation of a single salient object during switches between salient objects, as opposed to simultaneously segmenting two objects, a saliency enhancement module based on the Squeeze and Excitation Block is incorporated. Experimental results show that the approach outperforms other state‐of‐the‐art methods in egocentric video salient object detection tasks. Dataset and codes can be found at https://github.com/hzhang1999/SalEgo .
Peri AkivaJing HuangKevin J LiangRama KovvuriXingyu ChenMatt FeiszliKristin DanaTal Hassner
Ao MouYukang LuJiahao HeDingyao MinKeren FuQijun Zhao
Keren FuIrene Yu‐Hua GuYixiao YunChen GongJie Yang
Binwei XuQiuping JiangHaoran LiangDingwen ZhangRonghua LiangPeng Chen