An approach for parsing MPEG compressed video into shots and sub-shots based only on the macroblock (MB) and motion vector (MV) information is presented. The system follows a two-pass scheme and has a hybrid rule-based/neural structure. A rough scan over the P frames locates the potential shot boundaries and the solution is then refined by a precise scan over the B frames of the respective neighborhoods. The "simpler" boundaries are recognized by the rule-based module, while the decisions for the "complex" ones are refined by the neural part. The latter is also used to distinguish dissolves from object and camera motions and to break the shots into sub-shots. The experiments demonstrate high speed and accuracy in shot detection and their characterization.
Zuowu NingZhaoyang ZhangZhi Liu