動きに基づく複数のオブジェクトの追跡

この例では、静止カメラからのビデオ中で移動するオブジェクトを自動検出し、動きに基づいた追跡を実行する方法を説明します。

動くオブジェクトの検出と動きに基づく追跡は、アクティビティ認識、交通監視および自動車安全性など、多くのコンピュータービジョンアプリケーションでの重要な要素です。動きに基づくオブジェクト追跡の問題は、2 つの部分に分けることができます。

各フレームでの動くオブジェクトの検出
同じオブジェクトに時間の経過と共に対応する検出の関連付け

動くオブジェクトの検出では、混合ガウスモデルに基づいた背景差分アルゴリズムを使用します。その結果得られる前景マスクにモルフォロジー演算を適用してノイズを除去します。最後にブロブ解析によって、動くオブジェクトに対応する可能性が高い連結ピクセルのグループを検出します。

検出を同じオブジェクトに関連付ける処理は、動きだけに基づいて行われます。各トラックの動きはカルマンフィルターによって推定されます。このフィルターは、各フレームにおけるトラックの位置を予測し、各検出が各トラックに割り当てられる確率を判定するために使用されます。

トラックの維持は、この例の重要な側面となります。どのフレームを取っても、トラックに割り当てられる検出もあれば、割り当てのない検出やトラックも存在します。割り当てのあるトラックは、対応する検出を使用して更新されます。割り当てのないトラックは不可視としてマークされます。割り当てのない検出は新しいトラックを開始します。

各トラックは、割り当てのない状態が続いたフレームの数をカウントします。カウントが指定のしきい値を超えると、そのオブジェクトは視野の外に出たと仮定され、トラックが削除されます。

詳細については、複数オブジェクトの追跡を参照してください。

この例は、本体部分が上部にあり、ヘルパールーチンが入れ子関数の形式になっている関数です。

function MotionBasedMultiObjectTrackingExample()

% Create System objects used for reading video, detecting moving objects,
% and displaying the results.
obj = setupSystemObjects();

tracks = initializeTracks(); % Create an empty array of tracks.

nextId = 1; % ID of the next track

% Detect moving objects, and track them across video frames.
while hasFrame(obj.reader)
    frame = readFrame(obj.reader);
    [centroids, bboxes, mask] = detectObjects(frame);
    predictNewLocationsOfTracks();
    [assignments, unassignedTracks, unassignedDetections] = ...
        detectionToTrackAssignment();

    updateAssignedTracks();
    updateUnassignedTracks();
    deleteLostTracks();
    createNewTracks();

    displayTrackingResults();
end

System object の作成

ビデオフレームの読み取り、前景オブジェクトの検出および結果の表示に使用される System object を作成します。

    function obj = setupSystemObjects()
        % Initialize Video I/O
        % Create objects for reading a video from a file, drawing the tracked
        % objects in each frame, and playing the video.

        % Create a video reader.
        obj.reader = VideoReader('atrium.mp4');

        % Create two video players, one to display the video,
        % and one to display the foreground mask.
        obj.maskPlayer = vision.VideoPlayer('Position', [740, 400, 700, 400]);
        obj.videoPlayer = vision.VideoPlayer('Position', [20, 400, 700, 400]);

        % Create System objects for foreground detection and blob analysis

        % The foreground detector is used to segment moving objects from
        % the background. It outputs a binary mask, where the pixel value
        % of 1 corresponds to the foreground and the value of 0 corresponds
        % to the background.

        obj.detector = vision.ForegroundDetector('NumGaussians', 3, ...
            'NumTrainingFrames', 40, 'MinimumBackgroundRatio', 0.7);

        % Connected groups of foreground pixels are likely to correspond to moving
        % objects.  The blob analysis System object is used to find such groups
        % (called 'blobs' or 'connected components'), and compute their
        % characteristics, such as area, centroid, and the bounding box.

        obj.blobAnalyser = vision.BlobAnalysis('BoundingBoxOutputPort', true, ...
            'AreaOutputPort', true, 'CentroidOutputPort', true, ...
            'MinimumBlobArea', 400);
    end

トラックの初期化

関数 initializeTracks はトラックの配列を作成します。ここで、各トラックはビデオ内の動くオブジェクトを表す構造体です。この構造体の目的は、追跡されるオブジェクトの状態を維持することです。状態は、トラックへの検出の割り当て、トラックの終了および表示に使用される情報で構成されます。

構造体には次のフィールドが含まれています。

id: トラックの整数 ID
bbox: 表示に使用される、オブジェクトの現在の境界ボックス
kalmanFilter: 動きに基づく追跡に使用される、カルマンフィルターオブジェクト
age: トラックが最初に検出されてからのフレーム数
totalVisibleCount: トラックが検出された (可視であった) フレームの合計数
consecutiveInvisibleCount: トラックが連続して検出されなかった (不可視であった) フレームの数

ノイズの多い検出では、トラックが短時間で終了する傾向にあります。そのため、この例ではある程度の数のフレームで追跡されたオブジェクトのみを表示します。これが起こるのは、totalVisibleCount が指定のしきい値を超えたときです。

あるトラックに検出が 1 つも関連付けられない状態が数フレーム続くと、そのオブジェクトは視野の外に出たものと仮定され、トラックが削除されます。これが起こるのは、consecutiveInvisibleCount が指定のしきい値を超えたときです。トラックの追跡時間が短く、大部分のフレームで不可視としてマークされた場合も、トラックはノイズとして削除されます。

    function tracks = initializeTracks()
        % create an empty array of tracks
        tracks = struct(...
            'id', {}, ...
            'bbox', {}, ...
            'kalmanFilter', {}, ...
            'age', {}, ...
            'totalVisibleCount', {}, ...
            'consecutiveInvisibleCount', {});
    end

オブジェクトの検出

関数 detectObjects は、検出されたオブジェクトの重心と境界ボックスを返します。また、入力フレームと同じサイズのバイナリマスクも返します。値が 1 のピクセルは前景に対応し、値が 0 のピクセルは背景に対応します。

この関数は前景検出器を使用して動きのセグメンテーションを行います。その後、結果のバイナリマスクに対してモルフォロジー演算を実行し、ノイズの多いピクセルを削除して、残りのブロブにある穴を塗りつぶします。

    function [centroids, bboxes, mask] = detectObjects(frame)

        % Detect foreground.
        mask = obj.detector.step(frame);

        % Apply morphological operations to remove noise and fill in holes.
        mask = imopen(mask, strel('rectangle', [3,3]));
        mask = imclose(mask, strel('rectangle', [15, 15]));
        mask = imfill(mask, 'holes');

        % Perform blob analysis to find connected components.
        [~, centroids, bboxes] = obj.blobAnalyser.step(mask);
    end

既存のトラックの新しい位置の予測

カルマンフィルターを使用して、現在のフレームにおける各トラックの重心を予測し、それに応じて境界ボックスを更新します。

    function predictNewLocationsOfTracks()
        for i = 1:length(tracks)
            bbox = tracks(i).bbox;

            % Predict the current location of the track.
            predictedCentroid = predict(tracks(i).kalmanFilter);

            % Shift the bounding box so that its center is at
            % the predicted location.
            predictedCentroid = int32(predictedCentroid) - bbox(3:4) / 2;
            tracks(i).bbox = [predictedCentroid, bbox(3:4)];
        end
    end

トラックへの検出の割り当て

現在のフレームにおけるオブジェクト検出は、コストを最小化することで既存のトラックに割り当てられます。コストは、トラックに対応する検出の負の対数尤度として定義されます。

アルゴリズムには 2 つの手順があります。

手順 1: vision.KalmanFilter System object™ の distance メソッドを使用して、すべての検出を各トラックに割り当てるコストを計算します。このコストでは、予測されたトラックの重心と検出の重心間のユークリッド距離が考慮されます。また、カルマンフィルターにより維持される予測の信頼度も含められます。結果は M 行 N 列の行列に保存されます。ここで M はトラックの数、N は検出の数です。

手順 2: 関数 assignDetectionsToTracks を使用して、コスト行列で表される割り当て問題を解きます。関数は、コスト行列と、検出をトラックに割り当てない場合のコストを受け取ります。

検出をトラックに割り当てない場合のコストの値は、vision.KalmanFilter の distance メソッドで返される値の範囲によって決まります。この値は実験的に調整しなければなりません。設定が小さすぎると、新しいトラックが作成される可能性が高まり、トラックの断片化につながる可能性があります。逆に設定が大きすぎると、一連のバラバラな動くオブジェクトに単一のトラックが対応付けられる可能性があります。

関数 assignDetectionsToTracks はハンガリアン法アルゴリズムの Munkres バージョンを用いて、総コストが最小になる割り当てを計算します。そして、割り当てられたトラックと検出での対応するインデックスを 2 つの列に含む、M 行 2 列の行列を返します。また、割り当てのなかったトラックと検出のインデックスも返します。

    function [assignments, unassignedTracks, unassignedDetections] = ...
            detectionToTrackAssignment()

        nTracks = length(tracks);
        nDetections = size(centroids, 1);

        % Compute the cost of assigning each detection to each track.
        cost = zeros(nTracks, nDetections);
        for i = 1:nTracks
            cost(i, :) = distance(tracks(i).kalmanFilter, centroids);
        end

        % Solve the assignment problem.
        costOfNonAssignment = 20;
        [assignments, unassignedTracks, unassignedDetections] = ...
            assignDetectionsToTracks(cost, costOfNonAssignment);
    end

割り当てられたトラックの更新

関数 updateAssignedTracks は、割り当てられた各トラックを、対応する検出によって更新します。vision.KalmanFilter の correct メソッドを呼び出して、推定位置を訂正します。次に、新しい境界ボックスを保存して、トラックの持続期間と合計可視カウントを 1 増やします。最後に、関数により不可視カウントが 0 に設定されます。

    function updateAssignedTracks()
        numAssignedTracks = size(assignments, 1);
        for i = 1:numAssignedTracks
            trackIdx = assignments(i, 1);
            detectionIdx = assignments(i, 2);
            centroid = centroids(detectionIdx, :);
            bbox = bboxes(detectionIdx, :);

            % Correct the estimate of the object's location
            % using the new detection.
            correct(tracks(trackIdx).kalmanFilter, centroid);

            % Replace predicted bounding box with detected
            % bounding box.
            tracks(trackIdx).bbox = bbox;

            % Update track's age.
            tracks(trackIdx).age = tracks(trackIdx).age + 1;

            % Update visibility.
            tracks(trackIdx).totalVisibleCount = ...
                tracks(trackIdx).totalVisibleCount + 1;
            tracks(trackIdx).consecutiveInvisibleCount = 0;
        end
    end

割り当てのないトラックの更新

割り当てのない各トラックを不可視としてマークし、その持続期間を 1 増やします。

    function updateUnassignedTracks()
        for i = 1:length(unassignedTracks)
            ind = unassignedTracks(i);
            tracks(ind).age = tracks(ind).age + 1;
            tracks(ind).consecutiveInvisibleCount = ...
                tracks(ind).consecutiveInvisibleCount + 1;
        end
    end

失われたトラックの削除

関数 deleteLostTracks は、多くのフレームで連続して不可視だったトラックを削除します。また、最近作成されたトラックのうち、不可視だったフレームの総数が多いものも削除します。

    function deleteLostTracks()
        if isempty(tracks)
            return;
        end

        invisibleForTooLong = 20;
        ageThreshold = 8;

        % Compute the fraction of the track's age for which it was visible.
        ages = [tracks(:).age];
        totalVisibleCounts = [tracks(:).totalVisibleCount];
        visibility = totalVisibleCounts ./ ages;

        % Find the indices of 'lost' tracks.
        lostInds = (ages < ageThreshold & visibility < 0.6) | ...
            [tracks(:).consecutiveInvisibleCount] >= invisibleForTooLong;

        % Delete lost tracks.
        tracks = tracks(~lostInds);
    end

トラックの新規作成

割り当てのない検出から新しいトラックを作成します。割り当てのない検出は、すべて新しいトラックの開始であると仮定します。実際には、サイズ、位置、外観など、他の手がかりを使用してノイズの多い検出を排除できます。

    function createNewTracks()
        centroids = centroids(unassignedDetections, :);
        bboxes = bboxes(unassignedDetections, :);

        for i = 1:size(centroids, 1)

            centroid = centroids(i,:);
            bbox = bboxes(i, :);

            % Create a Kalman filter object.
            kalmanFilter = configureKalmanFilter('ConstantVelocity', ...
                centroid, [200, 50], [100, 25], 100);

            % Create a new track.
            newTrack = struct(...
                'id', nextId, ...
                'bbox', bbox, ...
                'kalmanFilter', kalmanFilter, ...
                'age', 1, ...
                'totalVisibleCount', 1, ...
                'consecutiveInvisibleCount', 0);

            % Add it to the array of tracks.
            tracks(end + 1) = newTrack;

            % Increment the next id.
            nextId = nextId + 1;
        end
    end

追跡結果の表示

関数 displayTrackingResults は、ビデオフレームおよび前景マスクの上に各トラックの境界ボックスとラベル ID を描画します。その後、フレームとマスクをそれぞれのビデオプレーヤーに表示します。

    function displayTrackingResults()
        % Convert the frame and the mask to uint8 RGB.
        frame = im2uint8(frame);
        mask = uint8(repmat(mask, [1, 1, 3])) .* 255;

        minVisibleCount = 8;
        if ~isempty(tracks)

            % Noisy detections tend to result in short-lived tracks.
            % Only display tracks that have been visible for more than
            % a minimum number of frames.
            reliableTrackInds = ...
                [tracks(:).totalVisibleCount] > minVisibleCount;
            reliableTracks = tracks(reliableTrackInds);

            % Display the objects. If an object has not been detected
            % in this frame, display its predicted bounding box.
            if ~isempty(reliableTracks)
                % Get bounding boxes.
                bboxes = cat(1, reliableTracks.bbox);

                % Get ids.
                ids = int32([reliableTracks(:).id]);

                % Create labels for objects indicating the ones for
                % which we display the predicted rather than the actual
                % location.
                labels = cellstr(int2str(ids'));
                predictedTrackInds = ...
                    [reliableTracks(:).consecutiveInvisibleCount] > 0;
                isPredicted = cell(size(labels));
                isPredicted(predictedTrackInds) = {' predicted'};
                labels = strcat(labels, isPredicted);

                % Draw the objects on the frame.
                frame = insertObjectAnnotation(frame, 'rectangle', ...
                    bboxes, labels);

                % Draw the objects on the mask.
                mask = insertObjectAnnotation(mask, 'rectangle', ...
                    bboxes, labels);
            end
        end

        % Display the mask and the frame.
        obj.maskPlayer.step(mask);
        obj.videoPlayer.step(frame);
    end

まとめ

この例では、複数の動くオブジェクトを検出して追跡するために、動きに基づくシステムを作成しました。異なるビデオを使用して、オブジェクトの検出と追跡を行えるかどうか試してみてください。また、検出、割り当て、削除の各手順でパラメーターを変更してみます。

この例では、すべてのオブジェクトが定速で直線運動することを仮定して、動きのみに基づく追跡を行いました。オブジェクトの動きがこのモデルから大きく外れる場合は、追跡エラーが発生する可能性があります。ここでは、12 番のラベルが付いた人物が木に隠されたときに追跡エラーが起きています。

追跡エラーの可能性を減らすには、等加速度のようなより複雑な運動モデルを使用するか、各オブジェクトに複数のカルマンフィルターを使用します。また、サイズ、形状、色など、検出を時間の経過と共に関連付けるための他の手がかりを取り入れることもできます。

end