METHOD FOR VIDEO RECOGNITION AND RELATED PRODUCTS

A method for video recognition and related products are provided. The method includes the following. An original set of clip descriptors is obtained by providing multiple clips of a video as an input of a 3D CNN of a neural network, where the neural network includes the 3D CNN and at least one first fully connected layer, and each of the multiple clips includes at least one frame. An attention vector corresponding to the original set of clip descriptors is determined. An enhanced set of clip descriptors is obtained based on the original set of clip descriptors and the attention vector. The enhanced set of clip descriptors is input into the at least one first fully connected layer and video recognition is performed based on an output of the at least one first fully connected layer.

Volver