M. DhanushreeR. PriyaP. ArunaR. Bhavani
In the past decade, video summarisation has emerged as one of the most challenging research fields in video understanding. Video summarisation is abstracting an original video by extracting the most informative parts or key events. In particular, generic video summarisation is challenging as the key events do not contain specific activities. In such circumstances, extensive spatial features are needed to identify video events. Thus, a stacked encoder-decoder architecture with a residual learning network (SERNet) model is proposed for generating dynamic summaries of generic videos. GoogleNet characteristics are extracted for each frame in the proposed model. After the bi-directional gated recurrent unit encodes video features, the gated recurrent unit decodes them. Both the encoder and decoder architectures leverage residual learning to extract hierarchical dense spatial features to increase video summarisation F-scores. SumMe and TVSum are used for experiments. Experimental results demonstrate that the suggested SERNet model has an F-score of 55.6 and 64.23 for SumMe and TVSum. Comparing the proposed SERNet model against state-of-the-art approaches indicates its robustness.
P. ArunaBhavani RajaramDhanushree M. N.A.R Priya
A. PathakOm Prakash SinghSagar N. PurkayasthaUjjwal Biswas
Praveen BaskarPranavi ArigelaSmruthika S. MedaM ThusharDinesh Singh
Punam SarmahAbhijit GogoiShobhanjana Kalita