KLT アルゴリズムを使用した顔の検出と追跡

この例では、特徴点を使用して顔の検出と追跡を自動的に行う方法を説明します。この例の方法は、その人が顔を傾けた場合や、カメラに近づいたりカメラから遠ざかった場合でも顔の追跡を続けます。

はじめに

オブジェクトの検出と追跡は、アクティビティ認識、自動車安全性、監視など、多くのコンピュータービジョンアプリケーションにおいて重要です。この例では、追跡問題を 3 つの部分に分けることで、シンプルな顔追跡システムを作成します。

顔の検出
追跡する顔の特徴の識別
顔の追跡

顔の検出

まずは、顔を検出しなければなりません。ビデオフレーム内の顔の位置を検出するには vision.CascadeObjectDetector オブジェクトを使用します。カスケード型オブジェクト検出器では、Viola-Jones 検出アルゴリズムと、検出用に学習済みの分類モデルを使用します。既定では、検出器は顔を検出するよう構成されていますが、他のタイプのオブジェクトの検出に使用することもできます。

% Create a cascade detector object.
faceDetector = vision.CascadeObjectDetector();

% Read a video frame and run the face detector.
videoReader = VideoReader("tilted_face.avi");
videoFrame      = readFrame(videoReader);
bbox            = step(faceDetector, videoFrame);

% Draw the returned bounding box around the detected face.
videoFrame = insertShape(videoFrame, "rectangle", bbox);
figure; imshow(videoFrame); title("Detected face");

Figure contains an axes object. The axes object with title Detected face contains an object of type image.

% Convert the first box into a list of 4 points
% This is needed to be able to visualize the rotation of the object.
bboxPoints = bbox2points(bbox(1, :));

顔を時間の経過と共に追跡するために、この例では Kanade-Lucas-Tomasi (KLT) アルゴリズムを使用します。各フレームにカスケード型オブジェクト検出器を使用することも可能ですが、これには大量の計算が必要です。また、対象者が向きを変えたり顔を傾けたりすると顔を検出できなくなる可能性もあります。この制限は、検出に使用される学習済み分類モデルのタイプに起因しています。この例で顔を検出するのは一度だけで、その後は KLT アルゴリズムを使い複数のビデオフレームを通して顔を追跡します。

追跡する顔の特徴の識別

KLT アルゴリズムは、複数のビデオフレームを通して一連の特徴点を追跡します。検出により顔を特定したら、例の次のステップでは確実に追跡できる特徴点を特定します。この例では、Shi & Tomasi の提唱する標準的な「追跡に適した特徴 (good features to track)」を使用します。

顔の領域で特徴点を検出します。

points = detectMinEigenFeatures(im2gray(videoFrame), "ROI", bbox);

% Display the detected points.
figure, imshow(videoFrame), hold on, title("Detected features");
plot(points);

Figure contains an axes object. The axes object with title Detected features contains 2 objects of type image, line. One or more of the lines displays its values using only markers

点を追跡するトラッカーの初期化

特徴点を特定したら、vision.PointTracker の System object を使用してそれらを追跡できます。ポイントトラッカーは、前のフレーム内の点ごとに現在のフレーム内の対応点を検出しようとします。その後、関数 estimateGeometricTransform2D を使用して、以前の点と新しい点の間の平行移動、回転およびスケールを推定します。この変換は顔の周りの境界ボックスに適用されます。

ポイントトラッカーを作成し、双方向の誤差制約を有効にして、ノイズや乱れが存在する状況下でトラッカーをさらにロバストにします。

pointTracker = vision.PointTracker("MaxBidirectionalError", 2);

% Initialize the tracker with the initial point locations and the initial
% video frame.
points = points.Location;
initialize(pointTracker, points, videoFrame);

結果を表示するビデオプレーヤーの初期化

ビデオフレームを表示するビデオプレーヤーオブジェクトを作成します。

videoPlayer  = vision.VideoPlayer("Position",...
    [100 100 [size(videoFrame, 2), size(videoFrame, 1)]+30]);

顔の追跡

フレームごとに点を追跡し、関数 estimateGeometricTransform2D を使用して顔の動きを推定します。

前のフレームの点と現在のフレームの点との間の幾何学的変換の計算に使用する点のコピーを作成します。

oldPoints = points;

while hasFrame(videoReader)
    % get the next frame
    videoFrame = readFrame(videoReader);

    % Track the points. Note that some points may be lost.
    [points, isFound] = step(pointTracker, videoFrame);
    visiblePoints = points(isFound, :);
    oldInliers = oldPoints(isFound, :);
    
    if size(visiblePoints, 1) >= 2 % need at least 2 points
        
        % Estimate the geometric transformation between the old points
        % and the new points and eliminate outliers
        [xform, inlierIdx] = estimateGeometricTransform2D(...
            oldInliers, visiblePoints, "similarity", "MaxDistance", 4);
        oldInliers    = oldInliers(inlierIdx, :);
        visiblePoints = visiblePoints(inlierIdx, :);
        
        % Apply the transformation to the bounding box points
        bboxPoints = transformPointsForward(xform, bboxPoints);
                
        % Insert a bounding box around the object being tracked
        bboxPolygon = reshape(bboxPoints', 1, []);
        videoFrame = insertShape(videoFrame, "polygon", bboxPolygon, ...
            "LineWidth", 2);
                
        % Display tracked points
        videoFrame = insertMarker(videoFrame, visiblePoints, "+", ...
            "MarkerColor", "white");       
        
        % Reset the points
        oldPoints = visiblePoints;
        setPoints(pointTracker, oldPoints);        
    end
    
    % Display the annotated video frame using the video player object
    step(videoPlayer, videoFrame);
end

% Clean up
release(videoPlayer);

Figure Video Player contains an axes object and other objects of type uiflowcontainer, uimenu, uitoolbar. The axes object contains an object of type image.

release(pointTracker);

まとめ

この例では、1 つの顔を自動的に検出して追跡する、シンプルな顔の追跡システムを作成しました。入力ビデオを変更して、その場合でも顔の検出と追跡が可能かどうか試してみてください。検出ステップの最初のフレームで、その人がカメラの方を向いていることを確認してください。

参考文献

Viola, Paul A. and Jones, Michael J. "Rapid Object Detection using a Boosted Cascade of Simple Features", IEEE CVPR, 2001.

Bruce D. Lucas and Takeo Kanade. An Iterative Image Registration Technique with an Application to Stereo Vision. International Joint Conference on Artificial Intelligence, 1981.

Carlo Tomasi and Takeo Kanade. Detection and Tracking of Point Features. Carnegie Mellon University Technical Report CMU-CS-91-132, 1991.

Jianbo Shi and Carlo Tomasi. Good Features to Track. IEEE Conference on Computer Vision and Pattern Recognition, 1994.

Zdenek Kalal, Krystian Mikolajczyk and Jiri Matas. Forward-Backward Error: Automatic Detection of Tracking Failures. International Conference on Pattern Recognition, 2010