Predicting the interestingness of videos can greatly improve people's satisfactions in many applications such as video retrieval and recommendations. In order to obtain less subjective interestingness annotations, partial pairwise comparisons among videos are firstly annotated and all videos are then ranked globally to generate the interestingness value. We study two factors in interestingness prediction, namely comparison information and evaluation metric optimization. In this paper, we propose a novel deep ranking model which simulates the human annotation procedures for more reliable interestingness prediction. To be specific, we extract different visual and acoustic features and sample different comparison video pairs by different strategies such as random and fixed-distance. The richer information of human pairwise ranking annotations are used as a richer guidance compared with the plain interestingness value to train our networks. In addition to comparison information, we also explore reinforcement ranking model which directly optimizes the evaluation metric. Experimental results demonstrate that the fusion of the two ranking models can make better use of human labels and outperform the regression baseline. Also, it reaches the best performance according to the results of MediaEval 2017 interestingness prediction task.
Jurandy AlmeidaLucas Pascotti ValemDaniel Carlos Guimarães Pedronette
Souad ChaabouniJenny Benois‐PineauAkka ZemmariChokri Ben Amar
Prof Selvarangam KDr Ramesh Kumar K
DUAN Shunran, YIN Meijuan, LIU Fenlin, JIAO Longlong, YU Lanlan