Underwater Image Captioning Based on Feature Fusion

Li Li; Yanan Wei; Peng Ren

doi:10.1145/3647649.3647700

ScienceGate Book Chapters

JOURNAL ARTICLE

Underwater Image Captioning Based on Feature Fusion

Li Li Yanan Wei Peng Ren

Year: 2024 Pages: 322-326

DOI: 10.1145/3647649.3647700

Get Full-Text PDF Get Analytical Report

Abstract

Image captioning employs artificial intelligence to translate visual content into natural language text descriptions. Underwater image captioning offers specialized interpretation for scenarios such as underwater environmental monitoring, underwater archaeology, and offshore platforms. It proves effective in compressing information for the real-time transmission of extensive underwater images via underwater acoustic communication. In this article, we annotate underwater image caption dataset for this task, and create a baseline using the encoder-decoder neural image caption model. It output complete sentences related to image content. The description of underwater images mainly focuses on the underwater scene and objects. The object detection model based on the Faster RCNN is applied to extract the full-image features and regional features corresponding to the target in the image. For the caption model, we enhanced the input features of the language generator by combining global information, regional details, contextual cues, and pre-ordered text information through feature fusion. It enables the generator to output precise semantic expressions related to salient objects. The method was applied to the annotated underwater image caption dataset, resulting in more accurate descriptions of underwater targets compared to sentences generated by a basic neural network model. The evaluation metrics reflected higher scores, affirming the effectiveness of our approach.

Keywords:

Closed captioning Underwater Computer science Artificial intelligence Feature (linguistics) Computer vision Generator (circuit theory) Image (mathematics) Feature extraction Pattern recognition (psychology)