predict

Compute video classifier predictions

Since R2021b

collapse all in page

Syntax

dLYVideo = predict(i3d,dlXVideo)

[dLYVideo,stateVideo] = predict(i3d,dlXVideo)

[dLYVideo,dlYFlow] = predict(i3d,dlXVideo,dlXFlow)

[dLYVideo,dlYFlow,stateVideo,stateFlow] = predict(i3d,dlXVideo,dlXFlow)

Description

dLYVideo = predict(i3d,dlXVideo) computes the predictions of the video classifier. i3d is specified as an inflated3dVideoClassifier, r2plus1dVideoClassifier, or slowFastVideoClassifier classifier object. Use this syntax when you set the OpticalFlowMethod property of the classifier object to "none".

example

[dLYVideo,stateVideo] = predict(i3d,dlXVideo) also returns the updated video network state. The output, stateVideo, contains information between training iterations maintained by the classifier. For example, the state of batch normalization operation.

[dLYVideo,dlYFlow] = predict(i3d,dlXVideo,dlXFlow) also returns the optical flow predictions from the classifier for training. Use this syntax when you set the OpticalFlowMethod property of the classifier object to "Farneback".

[dLYVideo,dlYFlow,stateVideo,stateFlow] = predict(i3d,dlXVideo,dlXFlow) also returns the updated video network and the optical flow network states.

Examples

collapse all

Compute Predictions for Video Using Video Classifier

This example uses:

Open Live Script

This example shows how to compute predictions for a video using a video classifier. To learn more about how to train a video classifier network for your dataset, see Gesture Recognition using Videos and Deep Learning.

Load a video classifier pretrained on the Kinetics-400 data set.

sf = slowFastVideoClassifier;

Specify the video file name.

videoFilename = "pushup.mp4";

Create a VideoReader to read the video frames.

reader = VideoReader(videoFilename);

Read the required number of video frames corresponding to the video classifier network, from the beginning of the video file. The required number of frames is defined by the value of the 4th element of the InputSize property of the video classifier.

sequenceLength = sf.InputSize(4);
sequenceRange = [1, sequenceLength];
videoFrames = read(reader,sequenceRange);

Resize video frames for prediction. The required height and width are defined by the first two elements of the InputSize property of the video classifier.

heightWidth = sf.InputSize(1:2);
resized = imresize(videoFrames,heightWidth);

Convert the input to type single.

resized = single(resized);

Rescale the input between 0 and 1.

minValue = sf.InputNormalizationStatistics.Min;
maxValue = sf.InputNormalizationStatistics.Max;
minValue = reshape(minValue,1,1,3);
maxValue = reshape(maxValue,1,1,3);
resized = rescale(resized,0,1,InputMin=minValue,InputMax=maxValue);

Normalize the video data using the mean and standard deviation.

meanValue = sf.InputNormalizationStatistics.Mean;
stdValue = sf.InputNormalizationStatistics.StandardDeviation;
meanValue = reshape(meanValue,1,1,3);
stdValue = reshape(stdValue,1,1,3);
resized = resized - meanValue;
resized = resized./stdValue;

Convert the input to dlarray object.

dlVideo = dlarray(resized,"SSCTB");
predictionScores = predict(sf,dlVideo);

Find the class label corresponding to the maximum score.

[score,idx] = max(predictionScores);
label = sf.Classes(idx)

label = categorical
     push up

Display the predicted class label and the score.

text = string(label) + "; " + num2str(score,"%0.2f");
frame = videoFrames(:,:,:,end);
frame = insertText(frame,[30,30],text,FontSize=24);

imshow(frame)

Input Arguments

collapse all

`i3d` — Classifier
`inflated3dVideoClassifier` object

Classifier, specified as an inflated3dVideoClassifier object.

`dlXVideo` — Video input
H-by-W-by-C-by-T-by-B `SSCTB` formatted `dlarray` object

Video input, specified as an H-by-W-by-C-by-T-by-B SSCTB formatted dlarray (Deep Learning Toolbox) object that corresponds to the video input of the classifier.

H — Height.
W — Width.
C — Number of channels. The number of channels must be equal to the channels value of the InputSize property of the classifier object.
T — Number of frames. The number of frames must be equal to the frames value of the InputSize property of the classifier object.
B — Batch size.

`dlXFlow` — Video and optical flow input
`SSCTB` formatted `dlarray` object

Video and optical flow input, specified as an H-by-W-by-C-by-T-by-B SSCTB formatted dlarray (Deep Learning Toolbox) object that corresponds to the video and optical flow input of the classifier.

H — Height.
W — Width.
C — Number of channels. The number of channels must be equal to the channels value of the InputSize property of the classifier object.
T — Number of frames. The number of frames must be equal to the frames value of the InputSize property of the classifier object.
B — Batch size.

Output Arguments

collapse all

`dLYVideo` — Activations of video network
`dlarray` object

Activations of the video network, returned as a formatted dlarray (Deep Learning Toolbox) object.

`stateVideo` — Updated video network state
table

Updated video network state, returned as a table with three columns:

Layer — Layer name, specified as a string scalar.
Parameter — Parameter name, specified as a string scalar.
Value — Value of the parameter, specified as a dlarray object.

The network state contains information remembered by the network between iterations. For example, the state of the LSTM and batch normalization layers.

During training or inference, you can update the network state using the output of the forward and predict functions.

`dlYFlow` — Activations of optical flow network
`dlarray` object

Activations of the optical flow network, returned as a formatted dlarray (Deep Learning Toolbox) object.

`stateFlow` — Updated optical flow network state
table

Updated optical flow network state, returned as a table with three columns:

Layer — Layer name, specified as a string scalar.
Parameter — Parameter name, specified as a string scalar.
Value — Value of the parameter, specified as a dlarray object.

The network state contains information remembered by the network between iterations. For example, the state of LSTM and batch normalization layers.

During training or inference, you can update the network state using the output of the forward and predict functions.

Version History

Introduced in R2021b

predict

Syntax

Description

Examples

Compute Predictions for Video Using Video Classifier

Input Arguments

`i3d` — Classifier
`inflated3dVideoClassifier` object

`dlXVideo` — Video input
H-by-W-by-C-by-T-by-B `SSCTB` formatted `dlarray` object

`dlXFlow` — Video and optical flow input
`SSCTB` formatted `dlarray` object

Output Arguments

`dLYVideo` — Activations of video network
`dlarray` object

`stateVideo` — Updated video network state
table

`dlYFlow` — Activations of optical flow network
`dlarray` object

`stateFlow` — Updated optical flow network state
table

Version History

See Also

Functions

Objects

Topics

predict

Syntax

Description

Examples

Compute Predictions for Video Using Video Classifier

Input Arguments

i3d — Classifier inflated3dVideoClassifier object

dlXVideo — Video input H-by-W-by-C-by-T-by-B SSCTB formatted dlarray object

dlXFlow — Video and optical flow input SSCTB formatted dlarray object

Output Arguments

dLYVideo — Activations of video network dlarray object

stateVideo — Updated video network state table

dlYFlow — Activations of optical flow network dlarray object

stateFlow — Updated optical flow network state table

Version History

See Also

Functions

Objects

Topics

`i3d` — Classifier
`inflated3dVideoClassifier` object

`dlXVideo` — Video input
H-by-W-by-C-by-T-by-B `SSCTB` formatted `dlarray` object

`dlXFlow` — Video and optical flow input
`SSCTB` formatted `dlarray` object

`dLYVideo` — Activations of video network
`dlarray` object

`stateVideo` — Updated video network state
table

`dlYFlow` — Activations of optical flow network
`dlarray` object

`stateFlow` — Updated optical flow network state
table