走行中の自動車からの歩行者の追跡

この例では、走行中の自動車に取り付けられたカメラで歩行者を追跡する方法を説明します。

概要

この例では、移動カメラから撮ったビデオに映っている人物を自動的に検出し、追跡する方法について説明します。これは、追跡システムを移動カメラに応用する際の柔軟性を示すものであり、自動車安全性のアプリケーションに最適です。静止カメラを使った動きに基づく複数のオブジェクトの追跡の例とは異なり、この例にはアルゴリズム上の追加ステップがいくつか含まれています。こうしたステップには、人物の検出、カスタマイズされた非最大値抑制、ヒューリスティックな方法による誤警報トラック (false alarm track) の特定と排除などが含まれます。詳細については、複数オブジェクトの追跡を参照してください。

この例は、本体部分が上部にあり、ヘルパールーチンが入れ子関数とはの形式で下部に置かれた関数です。

function PedestrianTrackingFromMovingCameraExample()

% Create system objects used for reading video, loading prerequisite data file, detecting pedestrians, and displaying the results.
videoFile       = 'vippedtracking.mp4';
scaleDataFile   = 'pedScaleTable.mat'; % An auxiliary file that helps to determine the size of a pedestrian at different pixel locations.

obj = setupSystemObjects(videoFile, scaleDataFile);

detector = peopleDetectorACF('caltech');

% Create an empty array of tracks.
tracks = initializeTracks();

% ID of the next track.
nextId = 1;

% Set the global parameters.
option.ROI                  = [40 95 400 140];  % A rectangle [x, y, w, h] that limits the processing area to ground locations.
option.scThresh             = 0.3;              % A threshold to control the tolerance of error in estimating the scale of a detected pedestrian.
option.gatingThresh         = 0.9;              % A threshold to reject a candidate match between a detection and a track.
option.gatingCost           = 100;              % A large value for the assignment cost matrix that enforces the rejection of a candidate match.
option.costOfNonAssignment  = 10;               % A tuning parameter to control the likelihood of creation of a new track.
option.timeWindowSize       = 16;               % A tuning parameter to specify the number of frames required to stabilize the confidence score of a track.
option.confidenceThresh     = 2;                % A threshold to determine if a track is true positive or false alarm.
option.ageThresh            = 8;                % A threshold to determine the minimum length required for a track being true positive.
option.visThresh            = 0.6;              % A threshold to determine the minimum visibility value for a track being true positive.

% Detect people and track them across video frames.
stopFrame = 1629; % stop on an interesting frame with several pedestrians
for fNum = 1:stopFrame
    frame   = readFrame(obj.reader);

    [centroids, bboxes, scores] = detectPeople();

    predictNewLocationsOfTracks();

    [assignments, unassignedTracks, unassignedDetections] = ...
        detectionToTrackAssignment();

    updateAssignedTracks();
    updateUnassignedTracks();
    deleteLostTracks();
    createNewTracks();

    displayTrackingResults();

    % Exit the loop if the video player figure is closed.
    if ~isOpen(obj.videoPlayer)
        break;
    end
end

追跡システムの補助入力とグローバルパラメーター

この追跡システムには、イメージ内のピクセル位置を歩行者の位置をマーキングする境界ボックスのサイズに関連付ける情報が含まれているデータファイルが必要です。この予備知識はベクトル pedScaleTable に保存されます。pedScaleTable の n 番目のエントリは、成人身長の推定値をピクセル単位で表します。インデックス n は、歩行者の足のおおよその Y 座標を参照します。

このようなベクトルを得るために、学習イメージのコレクションを同じ視点の、かつテスト環境と似ているシーンから取得しました。学習イメージには、カメラからさまざまな距離を置いた歩行者のイメージが含まれています。イメージラベラーアプリを使用して、イメージ内での歩行者の境界ボックスに手動で注釈が付けられました。境界ボックスの高さとイメージ内の歩行者の位置を合わせて使用し、回帰によってスケールデータファイルが生成されました。線形回帰モデルを近似するアルゴリズムのステップを示す補助関数は、helperTableOfScales.m です。

また、一連のグローバルパラメーターを調整して追跡のパフォーマンスを最適化することもできます。これらのパラメーターが追跡のパフォーマンスに与える影響については、次の説明を参照してください。

ROI :[x, y, w, h] の形式をもつ関心領域。処理領域を地上の位置に制限します。
scThresh :スケール推定の許容誤差しきい値。検出されたスケールと予想されるスケール間の差が許容誤差を超える場合、検出候補は非現実的であると見なされ、出力から削除されます。
gatingThresh :距離尺度のゲーティングパラメーター。検出された境界ボックスと予測された境界ボックスのマッチングコストがしきい値を超えると、システムによってこの 2 つの境界ボックスの関連付けが追跡対象から削除されます。
gatingCost :追跡が検出に割り当てられないようにするための割り当てコスト行列の値。
costOfNonAssignment :検出またはトラックを割り当てない場合の割り当てコスト行列の値。設定が小さすぎると、新しいトラックが作成される可能性が高まり、トラックの断片化につながる可能性があります。逆に設定が大きすぎると、一連のバラバラな動くオブジェクトに単一のトラックが対応付けられる可能性があります。
timeWindowSize :トラックの信頼度を推定するのに必要なフレーム数。
confidenceThresh :トラックが真陽性であるかを判定するための信頼度しきい値。
ageThresh :真陽性であるトラックの最小長。
visThresh :トラックが真陽性であるかを判定するための最小可視性しきい値。

追跡システムを初期化する System object の作成

関数 setupSystemObjects は、ビデオフレームの読み取りと表示に使用される System object を作成し、スケールデータのファイルを読み込みます。

pedScaleTable ベクトルはスケールデータのファイルに保存されており、ターゲットとシーンに関する予備知識をエンコードします。サンプルによるリグレッサーの学習を済ませたら、イメージ内で取り得るすべての y 位置で予想される身長を計算できます。これらの値はベクトルに保存されます。pedScaleTable の n 番目のエントリは、成人身長の推定値をピクセル単位で表します。インデックス n は、歩行者の足のおおよその Y 座標を参照します。

    function obj = setupSystemObjects(videoFile,scaleDataFile)
        % Initialize Video I/O
        % Create objects for reading a video from a file, drawing the
        % detected and tracked people in each frame, and playing the video.

        % Create a video file reader.
        obj.reader = VideoReader(videoFile);

        % Create a video player.
        obj.videoPlayer = vision.VideoPlayer('Position', [29, 597, 643, 386]);

        % Load the scale data file
        ld = load(scaleDataFile, 'pedScaleTable');
        obj.pedScaleTable = ld.pedScaleTable;
    end

トラックの初期化

関数 initializeTracks はトラックの配列を作成します。ここで、各トラックはビデオ内の動くオブジェクトを表す構造体です。この構造体の目的は、追跡されるオブジェクトの状態を維持することです。状態は、トラックへの検出の割り当て、トラックの終了、表示のそれぞれに使用される情報で構成されます。

構造体には次のフィールドが含まれています。

id :トラックの整数 ID。
color :トラックを表示する色。
bboxes :オブジェクトの境界ボックスを表す N 行 4 列の行列。最終行が現在のボックスを表します。各行の形式は [x, y, 幅, 高さ] です。
scores :人物検出器からの分類スコアを記録する N 行 1 列のベクトル。最終行が現在の検出スコアを表します。
kalmanFilter :動きに基づく追跡に使用される、カルマンフィルターオブジェクト。イメージ内でのオブジェクトの中心点を追跡します。
age :トラックが初期化されてからのフレーム数。
totalVisibleCount :オブジェクトが検出された (可視であった) フレームの合計数。
confidence :トラックの信頼度を表す 2 つの数値のペア。事前定義された時間枠内における過去の検出スコアの最大値と平均値を保存します。
predPosition :次のフレームで予測される境界ボックス。

    function tracks = initializeTracks()
        % Create an empty array of tracks
        tracks = struct(...
            'id', {}, ...
            'color', {}, ...
            'bboxes', {}, ...
            'scores', {}, ...
            'kalmanFilter', {}, ...
            'age', {}, ...
            'totalVisibleCount', {}, ...
            'confidence', {}, ...
            'predPosition', {});
    end

人物の検出

関数 detectPeople は、検出された人物の重心、境界ボックスおよび分類スコアを返します。peopleDetectorACF によって返される検出器の生の出力に対してフィルター処理と非最大値抑制を行います。

centroids :各行が [x, y] の形式をもつ N 行 2 列の行列。
bboxes :各行が [x, y, 幅, 高さ] の形式をもつ N 行 4 列の行列。
scores :各要素が対応するフレームでの分類スコアとなる N 行 1 列のベクトル。

    function [centroids, bboxes, scores] = detectPeople()
        % Resize the image to increase the resolution of the pedestrian.
        % This helps detect people further away from the camera.
        resizeRatio = 1.5;
        frame = imresize(frame, resizeRatio, 'Antialiasing',false);

        % Run ACF people detector within a region of interest to produce
        % detection candidates.
        [bboxes, scores] = detect(detector, frame, option.ROI, ...
            'WindowStride', 2,...
            'NumScaleLevels', 4, ...
            'SelectStrongest', false);

        % Look up the estimated height of a pedestrian based on location of their feet.
        height = bboxes(:, 4) / resizeRatio;
        y = (bboxes(:,2)-1) / resizeRatio + 1;
        yfoot = min(length(obj.pedScaleTable), round(y + height));
        estHeight = obj.pedScaleTable(yfoot);

        % Remove detections whose size deviates from the expected size,
        % provided by the calibrated scale estimation.
        invalid = abs(estHeight-height)>estHeight*option.scThresh;
        bboxes(invalid, :) = [];
        scores(invalid, :) = [];

        % Apply non-maximum suppression to select the strongest bounding boxes.
        [bboxes, scores] = selectStrongestBbox(bboxes, scores, ...
                            'RatioType', 'Min', 'OverlapThreshold', 0.6);

        % Compute the centroids
        if isempty(bboxes)
            centroids = [];
        else
            centroids = [(bboxes(:, 1) + bboxes(:, 3) / 2), ...
                (bboxes(:, 2) + bboxes(:, 4) / 2)];
        end
    end

既存のトラックの新しい位置の予測

カルマンフィルターを使用して、現在のフレームにおける各トラックの重心を予測し、それに応じて境界ボックスを更新します。前のフレームにある境界ボックスの幅と高さを、現在の予測サイズとします。

    function predictNewLocationsOfTracks()
        for i = 1:length(tracks)
            % Get the last bounding box on this track.
            bbox = tracks(i).bboxes(end, :);

            % Predict the current location of the track.
            predictedCentroid = predict(tracks(i).kalmanFilter);

            % Shift the bounding box so that its center is at the predicted location.
            tracks(i).predPosition = [predictedCentroid - bbox(3:4)/2, bbox(3:4)];
        end
    end

トラックへの検出の割り当て

現在のフレームにおけるオブジェクト検出は、コストを最小化することで既存のトラックに割り当てられます。コストは関数 bboxOverlapRatio を使って計算されます。これは、予測された境界ボックスと検出された境界ボックスのオーバーラップ率です。この例では、ビデオのフレームレートが高く、人の移動速度は遅いので、連続フレーム内では人が徐々に動くものと仮定します。

アルゴリズムには 2 つの手順があります。

手順 1: bboxOverlapRatio の測定を使用して、すべての検出を各トラックに割り当てるコストを計算します。人がカメラに近づいたりカメラから遠ざかるにつれて、重心点だけではその動きを正確に記述できなくなります。このコストでは、イメージ面上の距離と境界ボックスのスケールの両方が考慮されます。これにより、重心が一致する場合でも、カメラから遠く離れた検出がカメラ近くのトラックに割り当てられることのないようにします。このコスト関数を選択することで、より高度な動的モデルを使用せずに計算量を抑えることができます。結果は M 行 N 列の行列に保存されます。ここで M はトラックの数、N は検出の数です。

手順 2: 関数 assignDetectionsToTracks を使用して、コスト行列で表される割り当て問題を解きます。関数は、コスト行列と、検出をトラックに割り当てない場合のコストを受け取ります。

検出をトラックに割り当てない場合のコストの値は、コスト関数で返される値の範囲によって決まります。この値は実験的に調整しなければなりません。設定が小さすぎると、新しいトラックが作成される可能性が高まり、トラックの断片化につながる可能性があります。逆に設定が大きすぎると、一連のバラバラな動くオブジェクトに単一のトラックが対応付けられる可能性があります。

関数 assignDetectionsToTracks はハンガリアン法アルゴリズムの Munkres バージョンを用いて、総コストが最小になる割り当てを計算します。そして、割り当てられたトラックと検出での対応するインデックスを 2 つの列に含む、M 行 2 列の行列を返します。また、割り当てのなかったトラックと検出のインデックスも返します。

    function [assignments, unassignedTracks, unassignedDetections] = ...
            detectionToTrackAssignment()

        % Compute the overlap ratio between the predicted boxes and the
        % detected boxes, and compute the cost of assigning each detection
        % to each track. The cost is minimum when the predicted bbox is
        % perfectly aligned with the detected bbox (overlap ratio is one)
        predBboxes = reshape([tracks(:).predPosition], 4, [])';
        cost = 1 - bboxOverlapRatio(predBboxes, bboxes);

        % Force the optimization step to ignore some matches by
        % setting the associated cost to be a large number. Note that this
        % number is different from the 'costOfNonAssignment' below.
        % This is useful when gating (removing unrealistic matches)
        % technique is applied.
        cost(cost > option.gatingThresh) = 1 + option.gatingCost;

        % Solve the assignment problem.
        [assignments, unassignedTracks, unassignedDetections] = ...
            assignDetectionsToTracks(cost, option.costOfNonAssignment);
    end

割り当てられたトラックの更新

関数 updateAssignedTracks は、割り当てられた各トラックを、対応する検出によって更新します。vision.KalmanFilter の correct メソッドを呼び出して、推定位置を訂正します。次に、最近の 4 つ (まで) のボックスのサイズの平均を求めて新しい境界ボックスを保存し、トラックの持続期間と合計可視カウントを 1 増やします。最後に、関数により前の検出スコアに基づいてトラックの信頼度スコアを調整します。

    function updateAssignedTracks()
        numAssignedTracks = size(assignments, 1);
        for i = 1:numAssignedTracks
            trackIdx = assignments(i, 1);
            detectionIdx = assignments(i, 2);

            centroid = centroids(detectionIdx, :);
            bbox = bboxes(detectionIdx, :);

            % Correct the estimate of the object's location
            % using the new detection.
            correct(tracks(trackIdx).kalmanFilter, centroid);

            % Stabilize the bounding box by taking the average of the size
            % of recent (up to) 4 boxes on the track.
            T = min(size(tracks(trackIdx).bboxes,1), 4);
            w = mean([tracks(trackIdx).bboxes(end-T+1:end, 3); bbox(3)]);
            h = mean([tracks(trackIdx).bboxes(end-T+1:end, 4); bbox(4)]);
            tracks(trackIdx).bboxes(end+1, :) = [centroid - [w, h]/2, w, h];

            % Update track's age.
            tracks(trackIdx).age = tracks(trackIdx).age + 1;

            % Update track's score history
            tracks(trackIdx).scores = [tracks(trackIdx).scores; scores(detectionIdx)];

            % Update visibility.
            tracks(trackIdx).totalVisibleCount = ...
                tracks(trackIdx).totalVisibleCount + 1;

            % Adjust track confidence score based on the maximum detection
            % score in the past 'timeWindowSize' frames.
            T = min(option.timeWindowSize, length(tracks(trackIdx).scores));
            score = tracks(trackIdx).scores(end-T+1:end);
            tracks(trackIdx).confidence = [max(score), mean(score)];
        end
    end

割り当てのないトラックの更新

関数 updateUnassignedTracks は、割り当てのない各トラックを不可視としてマークし、その持続期間を 1 増やして、予測される境界ボックスをトラックに追加します。トラックに割り当てられなかった理由が不明であるため、信頼度はゼロに設定されます。

    function updateUnassignedTracks()
        for i = 1:length(unassignedTracks)
            idx = unassignedTracks(i);
            tracks(idx).age = tracks(idx).age + 1;
            tracks(idx).bboxes = [tracks(idx).bboxes; tracks(idx).predPosition];
            tracks(idx).scores = [tracks(idx).scores; 0];

            % Adjust track confidence score based on the maximum detection
            % score in the past 'timeWindowSize' frames
            T = min(option.timeWindowSize, length(tracks(idx).scores));
            score = tracks(idx).scores(end-T+1:end);
            tracks(idx).confidence = [max(score), mean(score)];
        end
    end

失われたトラックの削除

関数 deleteLostTracks は、多くのフレームで連続して不可視だったトラックを削除します。また、最近作成されたトラックのうち、不可視だったフレームの総数が多いものも削除します。

検出にノイズが多いと、偽のトラックが作成される傾向があります。この例では、以下の条件が満たされた場合にトラックを削除します。

オブジェクトの追跡時間が短かった場合。これは通常、誤検出が数フレームだけ現れ、それに対してトラックが開始された場合に起こります。
トラックが大部分のフレームで不可視としてマークされた場合。
過去の数フレームで強度の検出を受け取れなかった場合。これは最大検出信頼度スコアとして表されます。

    function deleteLostTracks()
        if isempty(tracks)
            return;
        end

        % Compute the fraction of the track's age for which it was visible.
        ages = [tracks(:).age]';
        totalVisibleCounts = [tracks(:).totalVisibleCount]';
        visibility = totalVisibleCounts ./ ages;

        % Check the maximum detection confidence score.
        confidence = reshape([tracks(:).confidence], 2, [])';
        maxConfidence = confidence(:, 1);

        % Find the indices of 'lost' tracks.
        lostInds = (ages <= option.ageThresh & visibility <= option.visThresh) | ...
             (maxConfidence <= option.confidenceThresh);

        % Delete lost tracks.
        tracks = tracks(~lostInds);
    end

トラックの新規作成

割り当てのない検出から新しいトラックを作成します。割り当てのない検出は、すべて新しいトラックの開始であると仮定します。実際には、サイズ、位置、外観など、他の手がかりを使用してノイズの多い検出を排除できます。

    function createNewTracks()
        unassignedCentroids = centroids(unassignedDetections, :);
        unassignedBboxes = bboxes(unassignedDetections, :);
        unassignedScores = scores(unassignedDetections);

        for i = 1:size(unassignedBboxes, 1)
            centroid = unassignedCentroids(i,:);
            bbox = unassignedBboxes(i, :);
            score = unassignedScores(i);

            % Create a Kalman filter object.
            kalmanFilter = configureKalmanFilter('ConstantVelocity', ...
                centroid, [2, 1], [5, 5], 100);

            % Create a new track.
            newTrack = struct(...
                'id', nextId, ...
                'color', rand(1,3), ...
                'bboxes', bbox, ...
                'scores', score, ...
                'kalmanFilter', kalmanFilter, ...
                'age', 1, ...
                'totalVisibleCount', 1, ...
                'confidence', [score, score], ...
                'predPosition', bbox);

            % Add it to the array of tracks.
            tracks(end + 1) = newTrack; %#ok<AGROW>

            % Increment the next id.
            nextId = nextId + 1;
        end
    end

追跡結果の表示

関数 displayTrackingResults は、ビデオフレームの各トラックについて色付きの境界ボックスを描画します。ボックスの透明度と表示されたスコアによって、検出とトラックの信頼度が示されます。

    function displayTrackingResults()

        displayRatio = 4/3;
        frame = imresize(frame, displayRatio);

        if ~isempty(tracks)
            ages = [tracks(:).age]';
            confidence = reshape([tracks(:).confidence], 2, [])';
            maxConfidence = confidence(:, 1);
            avgConfidence = confidence(:, 2);
            opacity = min(0.5,max(0.1,avgConfidence/3));
            noDispInds = (ages < option.ageThresh & maxConfidence < option.confidenceThresh) | ...
                       (ages < option.ageThresh / 2);

            for i = 1:length(tracks)
                if ~noDispInds(i)

                    % scale bounding boxes for display
                    bb = tracks(i).bboxes(end, :);
                    bb(:,1:2) = (bb(:,1:2)-1)*displayRatio + 1;
                    bb(:,3:4) = bb(:,3:4) * displayRatio;


                    frame = insertShape(frame, ...
                                            'FilledRectangle', bb, ...
                                            'ShapeColor', tracks(i).color, ...
                                            'Opacity', opacity(i));
                    frame = insertObjectAnnotation(frame, ...
                                            'rectangle', bb, ...
                                            num2str(avgConfidence(i)), ...
                                            'AnnotationColor', tracks(i).color);
                end
            end
        end

        frame = insertShape(frame, 'Rectangle', option.ROI * displayRatio, ...
                                'ShapeColor', [1 0 0], 'LineWidth', 3);

        step(obj.videoPlayer, frame);

    end

end