Yunpeng LiXiangrong ZhangGuanchun WangT. Zhang
Understanding complex change scenes is a crucial challenge in remote sensing field. Remote sensing image change captioning (RSICC) task has emerged as a promising approach to translate appeared changes between bi-temporal remote sensing images into textual descriptions, enabling users to make accurate decisions. Current RSICC methods frequently encounter difficulties in consistency for contextual awareness and semantic prior guidance. Therefore, this study explores difference semantic prior guidance network to reason context-rich sentence for capturing appeared vision changes. Specifically, the context-aware difference module is introduced to guarantee the consistency of unchanged/changed context features, strengthening multi-level changed information to improve the ability of semantic change feature representation. Moreover, to effectively mine higher-level cognition ability to reason salient/weak changes, we employ difference comprehending with shallow change information to realize semantic change knowledge learning. In addition, the designed parallel cross refined attention in Transformer decoder can balance vision difference and semantic knowledge for implicit knowledge distilling, enabling fine-grained perception changes of semantic details and reducing pseudochanges. Compared with advanced algorithms on the LEVIR-CC and Dubai-CC datasets, experimental results validate the outstanding performance of the designed model in RSICC tasks. Notably, on the LEVIR-CC dataset, it reaches a CIDEr score of 143.34%, representing a 3.11% improvement over the most competitive SAT-cap.
Yongshuo ZhuLu LiKeyan ChenChenyang LiuFugen ZhouZhenwei Shi
Renlong HangJinyu LuoHui LinQingshan Liu
Tianjun TangShengping ZhouYi Ren
Lanxiao WangZiyi WangHeqian QiuMinjian ZhangHongliang Li
Zhenghang YuanXuelong LiQi Wang