This example shows how to compute predictions for a video using a video classifier. To learn more about how to train a video classifier network for your dataset, see Gesture Recognition using Videos and Deep Learning.
Load a video classifier pretrained on the Kinetics-400 data set.
Specify the video file name.
Create a VideoReader
to read the video frames.
Read the required number of video frames corresponding to the video classifier network, from the beginning of the video file. The required number of frames is defined by the value of the 4th element of the InputSize
property of the video classifier.
Resize video frames for prediction. The required height and width are defined by the first two elements of the InputSize
property of the video classifier.
Convert the input to type single
.
Rescale the input between 0 and 1.
Normalize the video data using the mean and standard deviation.
Convert the input to dlarray object.
Find the class label corresponding to the maximum score.
label = categorical
push up
Display the predicted class label and the score.