単眼の Visual Simultaneous Localization and Mapping

この例では次を使用します。

Visual simultaneous localization and mapping (vSLAM) とは、環境地図作成を行うのと同時に、周囲に対するカメラの位置と向きを計算する処理を意味します。この処理では、カメラからの視覚的入力のみを使用します。vSLAM の用途には、拡張現実、ロボティクス、自動運転などがあります。

例に示されている vSLAM 実装には、次の 2 つのバージョンがあります。

モジュール式で変更可能 ─ 関数とオブジェクトを使用して、Visual SLAM パイプラインを段階的に構築します。これらの関数とオブジェクトの詳細およびリストについては、MATLAB での Visual SLAM の実装のトピックを参照してください。このトピックで説明されている手法にはモジュールコードが含まれており、一般的で信頼性の高い ORB-SLAM [1] アルゴリズムにおおむね基づいて、vSLAM 実装の詳細を説明することを目的としています。このコードは可読性が高いため、アルゴリズムを理解し、アルゴリズムのパラメーターがシステムのパフォーマンスにどのような影響を与えるかをテストすることができます。このモジュール式の実装は、独自のアイデアをテストするための学習、実験、および変更に最適です。
高パフォーマンスかつ展開可能 ─ 完全な vSLAM ワークフローを含む monovslam オブジェクトを使用します。オブジェクトは、実行速度とコード生成を大幅に改善する実用的な解決策を提供します。monovslamからマルチスレッド C/C++ コードを生成するには、MATLAB® Coder™ を使用できます。生成されたコードは移植可能であり、Build and Deploy Visual SLAM Algorithm with ROS in MATLABの例に示すように、PC 以外のハードウェアや ROS ノードにも展開できます。

この例では、モジュール式で変更可能な実装を示します。高パフォーマンスかつ展開可能な実装の詳細については、Performant and Deployable Monocular Visual SLAMの例を参照してください。

計算を高速化するために、Computer Vision Toolbox の基本設定ダイアログボックスで並列計算を有効にできます。Computer Vision Toolbox™ の基本設定を開くには、[ホーム] タブの [環境] セクションで [基本設定] をクリックします。そして [Computer Vision Toolbox] を選択します。

用語

この例では次の用語をよく使用します。

キーフレーム: 位置推定とトラッキングのキューを含む、ビデオフレームのサブセット。通常、2 つの連続するキーフレームにより、十分な視覚的変化が生じます。
マップ点: キーフレームから再構成した環境のマップを表す 3 次元点のリスト。
Covisibility グラフ: ノードとしてのキーフレームで構成されるグラフ。2 つのキーフレームが共通のマップ点を共有している場合、エッジにより接続されます。エッジの重みは、共有されているマップ点の数です。
Essential グラフ: 重みが大きい、つまり共有されているマップ点が多いエッジのみを含む、Covisibility グラフの部分グラフ。
場所認識データベース: ある場所を過去に訪れたことがあるかどうかを認識するために使用されるデータベース。このデータベースは、入力された bag of features に基づきビジュアルワードとイメージのマッピングを格納します。クエリイメージと視覚的に類似したイメージを探索するために使用されます。

ORB-SLAM の概要

ORB-SLAM のパイプラインには次が含まれます。

地図の初期化: ORB-SLAM はまず、2 つのビデオフレームから 3 次元点のマップを初期化します。2 次元 ORB 特徴の対応関係に基づく三角形分割を使用して、3 次元点と相対カメラ姿勢が計算されます。
トラッキング: マップが初期化された後は、新しいフレームごとに、現在のフレームの特徴を最後のキーフレームの特徴とマッチングすることにより、カメラ姿勢が推定されます。推定されたカメラ姿勢は、局所地図のトラッキングにより調整されます。
局所地図作成: 現在のフレームがキーフレームとして識別された場合に、そのフレームを使用して、新しい 3 次元マップ点を作成します。この段階で、カメラ姿勢と 3 次元点を調整することにより、バンドル調整を使用して再投影誤差を最小化します。
ループ閉じ込み: bag-of-features アプローチを使用して、キーフレームごとにそれまでのすべてのキーフレームと比較することで、ループを検出します。ループ閉じ込みが検出されると姿勢グラフが最適化され、すべてのキーフレームのカメラ姿勢を調整します。

入力イメージシーケンスのダウンロードと確認

この例で使用するデータは TUM RGB-D ベンチマーク [2] からのものです。Web ブラウザーを使用するか、次のコードを実行することで、データを一時ディレクトリにダウンロードできます。

baseDownloadURL = "https://cvg.cit.tum.de/rgbd/dataset/freiburg3/rgbd_dataset_freiburg3_long_office_household.tgz"; 
dataFolder      = fullfile(tempdir, 'tum_rgbd_dataset', filesep); 
options         = weboptions(Timeout=Inf);
tgzFileName     = [dataFolder, 'fr3_office.tgz'];
folderExists    = exist(dataFolder, "dir");

% Create a folder in a temporary directory to save the downloaded file
if ~folderExists  
    mkdir(dataFolder); 
    disp('Downloading fr3_office.tgz (1.38 GB). This download can take a few minutes.') 
    websave(tgzFileName, baseDownloadURL, options); 

    % Extract contents of the downloaded file
    disp('Extracting fr3_office.tgz (1.38 GB) ...') 
    untar(tgzFileName, dataFolder); 
end

imageDatastore オブジェクトを作成して、RGB イメージを検査します。

imageFolder   = [dataFolder,'rgbd_dataset_freiburg3_long_office_household/rgb/'];
imds          = imageDatastore(imageFolder);

% Inspect the first image
currFrameIdx = 1;
currI = readimage(imds, currFrameIdx);
himage = imshow(currI);

地図の初期化

ORB-SLAM パイプラインではまず、3 次元ワールドポイントをもつマップを初期化します。このステップは重要であり、SLAM の最終結果の精度に大きく影響します。2 つのイメージ間の matchFeatures を使用して、初期の ORB 特徴点の対応関係を求めます。対応関係を求めた後、2 つの幾何学的変換モデルを使用して、地図の初期化を設定します。

ホモグラフィ: シーンが平面の場合、特徴点の対応関係を表すには、ホモグラフィ射影変換の方が適しています。
基礎行列: シーンが平面ではない場合、代わりに基礎行列を使用しなければなりません。

再投影誤差がより少なかったモデルを選択し、estrelposeを使用して、2 つのフレーム間の相対的な回転と並進を推定します。深さの情報を提供しない単眼カメラから RGB イメージを取得しているため、相対並進はある程度のスケール係数までしか復元できません。

2 つのイメージの相対カメラ姿勢および一致した特徴点をもとに、関数triangulateを使用して、特徴点の 3 次元での位置を決定します。三角形分割されたマップ点は、それが両方のカメラの正面にある場合、再投影誤差が小さい場合、およびその点の 2 つのビューのパララックスが十分に大きい場合には有効です。

% Set random seed for reproducibility
rng(0);

% Create a cameraIntrinsics object to store the camera intrinsic parameters.
% The intrinsics for the dataset can be found at the following page:
% https://vision.in.tum.de/data/datasets/rgbd-dataset/file_formats
% Note that the images in the dataset are already undistorted, hence there
% is no need to specify the distortion coefficients.
focalLength    = [535.4, 539.2];    % in units of pixels
principalPoint = [320.1, 247.6];    % in units of pixels
imageSize      = size(currI,[1 2]);  % in units of pixels
intrinsics     = cameraIntrinsics(focalLength, principalPoint, imageSize);

% Detect and extract ORB features
scaleFactor = 1.2;
numLevels   = 8;
numPoints   = 1000;
[preFeatures, prePoints] = helperDetectAndExtractFeatures(currI, scaleFactor, numLevels, numPoints); 

currFrameIdx = currFrameIdx + 1;
firstI       = currI; % Preserve the first frame 

isMapInitialized  = false;

% Map initialization loop
while ~isMapInitialized && currFrameIdx < numel(imds.Files)
    currI = readimage(imds, currFrameIdx);

    [currFeatures, currPoints] = helperDetectAndExtractFeatures(currI, scaleFactor, numLevels, numPoints); 

    currFrameIdx = currFrameIdx + 1;

    % Find putative feature matches
    indexPairs = matchFeatures(preFeatures, currFeatures, Unique=true, ...
        MaxRatio=0.9, MatchThreshold=40);

    % If not enough matches are found, check the next frame
    minMatches = 100;
    if size(indexPairs, 1) < minMatches
        continue
    end

    preMatchedPoints  = prePoints(indexPairs(:,1),:);
    currMatchedPoints = currPoints(indexPairs(:,2),:);

    % Compute homography and evaluate reconstruction
    [tformH, scoreH, inliersIdxH] = helperComputeHomography(preMatchedPoints, currMatchedPoints);

    % Compute fundamental matrix and evaluate reconstruction
    [tformF, scoreF, inliersIdxF] = helperComputeFundamentalMatrix(preMatchedPoints, currMatchedPoints, intrinsics);

    % Select the model based on a heuristic
    ratio = scoreH/(scoreH + scoreF);
    ratioThreshold = 0.45;
    if ratio > ratioThreshold
        inlierTformIdx = inliersIdxH;
        tform          = tformH;
    else
        inlierTformIdx = inliersIdxF;
        tform          = tformF;
    end

    % Computes the camera location up to scale. Use half of the 
    % points to reduce computation
    inlierPrePoints  = preMatchedPoints(inlierTformIdx);
    inlierCurrPoints = currMatchedPoints(inlierTformIdx);
    [relPose, validFraction] = estrelpose(tform, intrinsics, ...
        inlierPrePoints(1:2:end), inlierCurrPoints(1:2:end));

    % If not enough inliers are found, move to the next frame
    if validFraction < 0.9 || numel(relPose)>1
        continue
    end

    % Triangulate two views to obtain 3-D map points
    minParallax = 1; % In degrees
    [isValid, xyzWorldPoints, inlierTriangulationIdx] = helperTriangulateTwoFrames(...
        rigidtform3d, relPose, inlierPrePoints, inlierCurrPoints, intrinsics, minParallax);

    if ~isValid
        continue
    end

    % Get the original index of features in the two key frames
    indexPairs = indexPairs(inlierTformIdx(inlierTriangulationIdx),:);

    isMapInitialized = true;

    disp(['Map initialized with frame 1 and frame ', num2str(currFrameIdx-1)])
end % End of map initialization loop

Map initialized with frame 1 and frame 30

if isMapInitialized
    close(himage.Parent.Parent); % Close the previous figure
    % Show matched features
    hfeature = showMatchedFeatures(firstI, currI, prePoints(indexPairs(:,1)), ...
        currPoints(indexPairs(:, 2)), "Montage");
else
    error('Unable to initialize the map.')
end

Figure contains an axes object. The hidden axes object contains 4 objects of type image, line. One or more of the lines displays its values using only markers

初期キーフレームとマップ点の格納

2 つのフレームを使用してマップを初期化した後で、imageviewsetとworldpointsetを使用して、この 2 つのキーフレームとこれらに対応するマップ点を格納できます。

imageviewset は、キーフレームとその属性 (ORB 記述子など)、特徴点とカメラ姿勢、およびキーフレーム間の関係 (特徴点の一致、相対カメラ姿勢など) を格納します。また、姿勢グラフを作成および更新します。オドメトリエッジの絶対カメラ姿勢と相対カメラ姿勢は、rigidtform3dオブジェクトとして保存されます。ループ閉じ込みエッジの相対カメラ姿勢は、affinetform3dオブジェクトとして保存されます。
worldpointsetは、マップ点の 3 次元の位置と、3 次元から 2 次元への投影の対応関係、つまりどのマップ点がキーフレームで観測され、どのキーフレームでマップ点を観測するかという関係も格納します。さらに、マップ点の他の属性 (平均的なビューの方向、代表的な ORB 記述子、およびマップ点を観測可能な距離範囲など) を格納します。

% Create an empty imageviewset object to store key frames
vSetKeyFrames = imageviewset;

% Create an empty worldpointset object to store 3-D map points
mapPointSet   = worldpointset;

% Add the first key frame. Place the camera associated with the first 
% key frame at the origin, oriented along the Z-axis
preViewId     = 1;
vSetKeyFrames = addView(vSetKeyFrames, preViewId, rigidtform3d, Points=prePoints,...
    Features=preFeatures.Features);

% Add the second key frame
currViewId    = 2;
vSetKeyFrames = addView(vSetKeyFrames, currViewId, relPose, Points=currPoints,...
    Features=currFeatures.Features);

% Add connection between the first and the second key frame
vSetKeyFrames = addConnection(vSetKeyFrames, preViewId, currViewId, relPose, Matches=indexPairs);

% Add 3-D map points
[mapPointSet, newPointIdx] = addWorldPoints(mapPointSet, xyzWorldPoints);

% Add image points corresponding to the map points in the first key frame
mapPointSet   = addCorrespondences(mapPointSet, preViewId, newPointIdx, indexPairs(:,1));

% Add image points corresponding to the map points in the second key frame
mapPointSet   = addCorrespondences(mapPointSet, currViewId, newPointIdx, indexPairs(:,2));

場所認識データベースの初期化

ループ検出は bags-of-words アプローチを使用して行われます。次の呼び出しを行うことにより、データセットに含まれる大量のイメージセットから抽出された ORB 記述子を使用して、bagOfFeaturesDBoW オブジェクトで表されるビジュアルボキャブラリがオフラインで作成されます。

bag = bagOfFeaturesDBoW(imds);

ここで、imds は学習イメージを格納するimageDatastoreオブジェクトです。

ループ閉じ込み処理により、データベースがインクリメンタルに作成されます。このデータベースは dbowLoopDetector オブジェクトで表され、bag of ORB features に基づきビジュアルワードとイメージのマッピングを格納します。

% Load the bag of features data created offline
bag         = bagOfFeaturesDBoW("bagOfFeatures.bin.gz");

% Initialize the place recognition database
loopDatabase    = dbowLoopDetector(bag);

% Add features of the first two key frames to the database
addVisualFeatures(loopDatabase, preViewId, preFeatures);
addVisualFeatures(loopDatabase, currViewId, currFeatures);

初期再構成の調整と可視化

bundleAdjustment を使用して、初期再構成を調整します。これにより、カメラ姿勢とワールドポイントの両方を最適化し、全体の再投影誤差を最小化します。調整後、マップ点の 3 次元での位置、ビューの方向、深さの範囲などの属性が更新されます。helperVisualizeMotionAndStructure を使用して、マップ点とカメラの位置を可視化できます。

% Run full bundle adjustment on the first two key frames
tracks       = findTracks(vSetKeyFrames);
cameraPoses  = poses(vSetKeyFrames);

[refinedPoints, refinedAbsPoses] = bundleAdjustment(xyzWorldPoints, tracks, ...
    cameraPoses, intrinsics, FixedViewIDs=1, ...
    PointsUndistorted=true, AbsoluteTolerance=1e-7,...
    RelativeTolerance=1e-15, MaxIteration=20, ...
    Solver="preconditioned-conjugate-gradient");

% Scale the map and the camera pose using the median depth of map points
medianDepth   = median(vecnorm(refinedPoints.'));
refinedPoints = refinedPoints / medianDepth;

refinedAbsPoses.AbsolutePose(currViewId).Translation = ...
    refinedAbsPoses.AbsolutePose(currViewId).Translation / medianDepth;
relPose.Translation = relPose.Translation/medianDepth;

% Update key frames with the refined poses
vSetKeyFrames = updateView(vSetKeyFrames, refinedAbsPoses);
vSetKeyFrames = updateConnection(vSetKeyFrames, preViewId, currViewId, relPose);

% Update map points with the refined positions
mapPointSet = updateWorldPoints(mapPointSet, newPointIdx, refinedPoints);

% Update view direction and depth 
mapPointSet = updateLimitsAndDirection(mapPointSet, newPointIdx, vSetKeyFrames.Views);

% Update representative view
mapPointSet = updateRepresentativeView(mapPointSet, newPointIdx, vSetKeyFrames.Views);

% Visualize matched features in the current frame
close(hfeature.Parent.Parent);
featurePlot   = helperVisualizeMatchedFeatures(currI, currPoints(indexPairs(:,2)));

Figure contains an axes object. The hidden axes object with title Matched Features in Current Frame contains 2 objects of type image, line. One or more of the lines displays its values using only markers

% Visualize initial map points and camera trajectory
mapPlot       = helperVisualizeMotionAndStructure(vSetKeyFrames, mapPointSet);

% Show legend
showLegend(mapPlot);

Figure Point Cloud Player contains an axes object. The axes object with xlabel X, ylabel Y contains 12 objects of type line, text, patch, scatter. These objects represent Map points, Estimated trajectory.

トラッキング

トラッキング処理は全フレームを使用して行われ、新しいキーフレームを挿入するタイミングを決定します。

% ViewId of the current key frame
currKeyFrameId   = currViewId;

% ViewId of the last key frame
lastKeyFrameId   = currViewId;

% Index of the last key frame in the input image sequence
lastKeyFrameIdx  = currFrameIdx - 1; 

% Indices of all the key frames in the input image sequence
addedFramesIdx   = [1; lastKeyFrameIdx];

isLoopClosed     = false;

各フレームが次のように処理されます。

新しいフレームごとに ORB 特徴が抽出され、既知の対応する 3 次元マップ点をもつ最後のキーフレームの特徴とのマッチングが行われます。線形カメラモーションモデルであると仮定した場合、最後のキーフレームの特徴点に対応する現在のフレーム内の予想される一致位置は、vision.PointTrackerを使用して予測できます。
現在のフレームにおける 3 次元から 2 次元への対応関係をもとに、estworldposeを使用して透視 n 点アルゴリズムでカメラ姿勢を推定し、bundleAdjustmentMotionを使用して動きのみのバンドル調整を行うことでカメラ姿勢を調整します。
matchFeaturesInRadiusを使用し、局所地図点を現在のフレームに投影して、その他の特徴の対応関係を探索します。そして、bundleAdjustmentMotionを使用して、再度カメラ姿勢を調整します。
トラッキングの最後のステップでは、現在のフレームが新しいキーフレームかどうかを決定します。現在のフレームがキーフレームの場合、"局所地図作成" 処理に進みます。そうでない場合、次のフレームのトラッキングを開始します。

十分な数の特徴点を一致させることができなかったためにトラッキングが失われた場合は、より頻繁に新しいキーフレームを挿入してみてください。

% Main loop
isLastFrameKeyFrame = true;

% Create and initialize the KLT tracker
tracker = vision.PointTracker(MaxBidirectionalError = 5);
initialize(tracker, currPoints.Location(indexPairs(:,2), :), currI);

while currFrameIdx < numel(imds.Files)  
    currI = readimage(imds, currFrameIdx);
    [currFeatures, currPoints] = helperDetectAndExtractFeatures(currI, scaleFactor, numLevels, numPoints);

    % Track the last key frame
    [currPose, mapPointsIdx, featureIdx] = helperTrackLastKeyFrameKLT(tracker, currI, mapPointSet, ...
        vSetKeyFrames.Views, currFeatures, currPoints, lastKeyFrameId, intrinsics);

    % Track the local map and check if the current frame is a key frame.
    % A frame is a key frame if both of the following conditions are satisfied:
    %
    % 1. At least 20 frames have passed since the last key frame or the
    %    current frame tracks fewer than 100 map points.
    % 2. The map points tracked by the current frame are fewer than 90% of
    %    points tracked by the reference key frame.
    %
    % Tracking performance is sensitive to the value of numPointsKeyFrame.  
    % If tracking is lost, try a larger value.
    %
    % localKeyFrameIds:   ViewId of the connected key frames of the current frame
    numSkipFrames     = 20;
    numPointsKeyFrame = 90;
    [localKeyFrameIds, currPose, mapPointsIdx, featureIdx, isKeyFrame] = ...
        helperTrackLocalMap(mapPointSet, vSetKeyFrames, mapPointsIdx, ...
        featureIdx, currPose, currFeatures, currPoints, intrinsics, scaleFactor, ...
        isLastFrameKeyFrame, lastKeyFrameIdx, currFrameIdx, numSkipFrames, numPointsKeyFrame);

    % Visualize matched features
    updatePlot(featurePlot, currI, currPoints(featureIdx));

    if ~isKeyFrame
        currFrameIdx        = currFrameIdx + 1;
        isLastFrameKeyFrame = false;
        continue
    else
        currKeyFrameId      = currKeyFrameId + 1;
        isLastFrameKeyFrame = true;
    end

局所地図作成

局所地図作成はすべてのキーフレームに対して行われます。新しいキーフレームが決定されると、それをキーフレームに追加し、新しいキーフレームにおいて観測されたマップ点の属性を更新します。mapPointSet に含まれる外れ値をできるだけ少なくするには、有効なマップ点が少なくとも 3 つのキーフレームで観測されなければなりません。

現在のキーフレームにおける ORB 特徴点と、接続されたキーフレームを三角形分割することで、新しいマップ点が作成されます。matchFeatures を使用して、現在のキーフレームにおける非マッチの特徴点ごとに、接続されたキーフレームにおいて、他の非マッチの点との一致を探索します。局所バンドル調整により、現在のキーフレームの姿勢、接続されたキーフレームの姿勢、およびこれらのキーフレームで観測されたすべてのマップ点を調整します。

    % Add the new key frame 
    [mapPointSet, vSetKeyFrames] = helperAddNewKeyFrame(mapPointSet, vSetKeyFrames, ...
        currPose, currFeatures, currPoints, mapPointsIdx, featureIdx, localKeyFrameIds);

    % Remove outlier map points that are observed in fewer than 3 key frames
    outlierIdx    = setdiff(newPointIdx, mapPointsIdx);
    if ~isempty(outlierIdx)
        mapPointSet   = removeWorldPoints(mapPointSet, outlierIdx);
    end

    % Create new map points by triangulation
    minNumMatches = 10;
    minParallax   = 3;
    [mapPointSet, vSetKeyFrames, newPointIdx] = helperCreateNewMapPoints(mapPointSet, vSetKeyFrames, ...
        currKeyFrameId, intrinsics, scaleFactor, minNumMatches, minParallax);

    % Local bundle adjustment
    [refinedViews, dist] = connectedViews(vSetKeyFrames, currKeyFrameId, MaxDistance=2);
    refinedKeyFrameIds = refinedViews.ViewId;
    fixedViewIds = refinedKeyFrameIds(dist==2);
    fixedViewIds = fixedViewIds(1:min(10, numel(fixedViewIds)));

    % Refine local key frames and map points
    [mapPointSet, vSetKeyFrames, mapPointIdx] = bundleAdjustment(...
        mapPointSet, vSetKeyFrames, [refinedKeyFrameIds; currKeyFrameId], intrinsics, ...
        FixedViewIDs=fixedViewIds, PointsUndistorted=true, AbsoluteTolerance=1e-7,...
        RelativeTolerance=1e-16, Solver="preconditioned-conjugate-gradient", ...
        MaxIteration=10);

    % Update view direction and depth
    mapPointSet = updateLimitsAndDirection(mapPointSet, mapPointIdx, vSetKeyFrames.Views);

    % Update representative view
    mapPointSet = updateRepresentativeView(mapPointSet, mapPointIdx, vSetKeyFrames.Views);

    % Visualize 3D world points and camera trajectory
    updatePlot(mapPlot, vSetKeyFrames, mapPointSet);

    % Set the feature points to be tracked
    [~, index2d] = findWorldPointsInView(mapPointSet, currKeyFrameId);
    setPoints(tracker, currPoints.Location(index2d, :));

ループ閉じ込み

ループ閉じ込みの検出ステップでは、局所地図作成処理で処理された現在のキーフレームを受け取り、ループを検出して閉じようと試みます。evaluateImageRetrieval を使用し、データベース内のイメージをクエリすることで、現在のキーフレームと視覚的に似ているループ候補が特定されます。候補キーフレームが最後のキーフレームと接続されておらず、隣接するキーフレームの 3 つがループ候補であれば、その候補キーフレームは有効です。

有効なループ候補が見つかると、estgeotform3dを使用して、ループ候補フレームと現在のキーフレームの間の相対姿勢を計算します。相対姿勢は、simtform3d オブジェクトに保存されている 3 次元相似変換を表します。そして、ループの接続を相対姿勢と共に追加し、mapPointSet と vSetKeyFrames を更新します。

    % Check loop closure after some key frames have been created    
    if currKeyFrameId > 20

        % Minimum number of feature matches of loop edges
        loopEdgeNumMatches = 50;

        % Detect possible loop closure key frame candidates
        [isDetected, validLoopCandidates] = helperCheckLoopClosure(vSetKeyFrames, currKeyFrameId, ...
            loopDatabase, currFeatures, loopEdgeNumMatches);

        if isDetected 
            % Add loop closure connections
            [isLoopClosed, mapPointSet, vSetKeyFrames] = helperAddLoopConnections(...
                mapPointSet, vSetKeyFrames, validLoopCandidates, currKeyFrameId, ...
                currFeatures, loopEdgeNumMatches);
        end
    end

    % If no loop closure is detected, add current features into the database
    if ~isLoopClosed
        addVisualFeatures(loopDatabase,  currKeyFrameId, currFeatures);
    end

    % Update IDs and indices
    lastKeyFrameId  = currKeyFrameId;
    lastKeyFrameIdx = currFrameIdx;
    addedFramesIdx  = [addedFramesIdx; currFrameIdx]; %#ok<AGROW>
    currFrameIdx    = currFrameIdx + 1;
end % End of main loop

Loop edge added between keyframe: 5 and 129
Loop edge added between keyframe: 5 and 130
Loop edge added between keyframe: 6 and 131
Loop edge added between keyframe: 6 and 144
Loop edge added between keyframe: 3 and 145
Loop edge added between keyframe: 5 and 146

最後に、vSetKeyFrames の Essential グラフ全体に対して相似姿勢グラフの最適化が行われ、カメラ姿勢のドリフトが修正されます。Essential グラフは、Covisibility グラフの接続のうち、一致数が minNumMatches よりも少ない接続を削除することで、内部的に作成されます。相似姿勢グラフの最適化後、最適化された姿勢と関連付けられたスケールを使用してマップ点の 3 次元での位置を更新します。

if isLoopClosed
    % Optimize the poses
    minNumMatches      = 30;
    vSetKeyFramesOptim = optimizePoses(vSetKeyFrames, minNumMatches, Tolerance=1e-16);

    % Update map points after optimizing the poses
    mapPointSet = helperUpdateGlobalMap(mapPointSet, vSetKeyFrames, vSetKeyFramesOptim);

    updatePlot(mapPlot, vSetKeyFrames, mapPointSet);

    % Plot the optimized camera trajectory
    optimizedPoses  = poses(vSetKeyFramesOptim);
    plotOptimizedTrajectory(mapPlot, optimizedPoses)

    % Update legend
    showLegend(mapPlot);
end

グラウンドトゥルースとの比較

最適化されたカメラの軌跡をグラウンドトゥルースと比較して、ORB-SLAM の精度を評価できます。ダウンロードしたデータには、各フレームのカメラ姿勢のグラウンドトゥルースが格納された groundtruth.txt ファイルが含まれています。helperImportGroundTruth 関数を使用して、次のようにデータをインポートできます。

gTruthFileName = [dataFolder,'rgbd_dataset_freiburg3_long_office_household/groundtruth.txt'];

gTruth = helperImportGroundTruth(gTruthFileName, imds);

この例では、データは MAT ファイル形式で保存されています。グラウンドトゥルースがインポートされたら、軌跡の平方根二乗誤差 (RMSE) の推定を計算できます。

if isLoopClosed

    % Load ground truth
    gTruthData = load("orbslamGroundTruth.mat");
    gTruth     = gTruthData.gTruth;

    % Plot the actual camera trajectory
    plotActualTrajectory(mapPlot, gTruth(addedFramesIdx), optimizedPoses);

    % Show legend
    showLegend(mapPlot);
end

Figure Point Cloud Player contains an axes object. The axes object with xlabel X, ylabel Y contains 4 objects of type scatter, line. These objects represent Map points, Estimated trajectory, Optimized trajectory, Actual trajectory.

if isLoopClosed

    % Evaluate tracking accuracy
    errorMetrics = compareTrajectories(optimizedPoses.AbsolutePose, gTruth(addedFramesIdx), AlignmentType="similarity");
    disp(['Absolute RMSE for key frame location (m): ', num2str(errorMetrics.AbsoluteRMSE(2))]);

Absolute RMSE for key frame location (m): 0.08697

    % Plot the absolute translation error at each key frame
    figure
    ax = plot(errorMetrics, "absolute-translation");

    % Adjust the view
    view(ax, [2.70 -49.20]);

Figure contains an axes object. The axes object with title Absolute Translation Error, xlabel X, ylabel Y contains 2 objects of type patch, line. These objects represent Estimated Trajectory, Ground Truth Trajectory.

end

次のパラメーターを調整することにより、さまざまなデータセットで Visual SLAM パイプラインをテストできます。

numPoints:480 x 640 ピクセルのイメージ解像度の場合、numPoints を 1,000 に設定します。720 x 1280 などの高解像度の場合は、2,000 に設定します。値が大きいほど、特徴抽出に時間がかかります。
numSkipFrames:フレームレートが 30 fps の場合、numSkipFrames を 20 に設定します。フレームレートが低い場合はより小さい値を設定します。numSkipFrames を増やすと追跡速度が向上しますが、カメラの動きが速い場合に追跡対象を見失うことがあります。

コード生成

MATLAB Coder の要件を満たすために、上記のコードをエントリポイント関数 helperVisualSLAMCodegen に再構築します。この関数は、イメージの cell 配列を入力として受け取り、3 次元worldpointset、推定されたカメラ姿勢、およびフレームインデックスを出力します。

コード生成ではimageDatastoreオブジェクトがサポートされないため、イメージを読み取り、グレースケールに変換して、cell 配列に保存します。

imageArray = cell(1,numel(imds.Files));
for i = 1:numel(imds.Files)
    imageArray{i} = im2gray(readimage(imds,i));
end

関数コンパイル命令 %#codegen (MATLAB Coder)を使用して、関数 helperVisualSLAMCodegen を MEX ファイルにコンパイルします。-report オプションを指定して、コンパイルレポートを生成できます。このレポートには元の MATLAB コードと、C コードの生成時に作成された関連ファイルが含まれています。MATLAB Coder が生成ファイルを保存できるように、一時ディレクトリを作成することもできます。既定では、生成された MEX ファイルの名前は、元の MATLAB 関数に接尾辞 "_mex" を付加した helperVisualSLAMCodegen_mex となります。あるいは、-o オプションを使用して MEX ファイルの名前を指定することもできます。

コード生成では、imageArray を関数 helperVisualSLAMCodegen への入力として渡さなければなりません。

cpuConfig = coder.config("mex");
cpuConfig.TargetLang = "C++";
codegen -config cpuConfig helperVisualSLAMCodegen -args {imageArray}

Code generation successful.

MEX ファイルを使用してイメージデータを処理します。

monoSlamOut = helperVisualSLAMCodegen_mex(imageArray);

補助関数 helperVisualizeMonoSlam への入力引数として monoSlamOut を指定して、カメラの推定軌跡と実際の軌跡をプロットします。

% Visualize the results
mapPlot = helperVisualizeMonoVisualSlam(monoSlamOut);

サポート関数

短い補助関数は以下のとおりです。大きな関数は別ファイルにあります。

"helperAddLoopConnections" は、現在のキーフレームと有効なループ候補の間の接続を追加します。

"helperAddNewKeyFrame" は、キーフレームセットにキーフレームを追加します。

"helperCheckLoopClosure" は、視覚的に似たイメージをデータベースから取得することで、ループ候補となるキーフレームを検出します。

"helperCreateNewMapPoints" は、三角形分割により、新しいマップ点を作成します。

"helperORBFeatureExtractorFunction" は、bagOfFeatures で使用される ORB 特徴抽出を実装します。

"helperTrackLastKeyFrameKLT" は、KLT トラッカーを使用して最後のキーフレームを追跡し、現在のカメラ姿勢を推定します。

"helperTrackLocalMap" は、局所地図のトラッキングにより、現在のカメラ姿勢を調整します。

"helperVisualizeMatchedFeatures" は、フレーム内のマッチした特徴を表示します。

"helperVisualizeMotionAndStructure" は、マップ点とカメラの軌跡を表示します。

"helperImportGroundTruth" は、ダウンロードしたデータからカメラの姿勢のグラウンドトゥルースをインポートします。

"helperDetectAndExtractFeatures" は、イメージから ORB 特徴を検出して抽出します。

function [features, validPoints] = helperDetectAndExtractFeatures(Irgb, ...
    scaleFactor, numLevels, numPoints, varargin)

% In this example, the images are already undistorted. In a general
% workflow, uncomment the following code to undistort the images.
%
% if nargin > 4
%     intrinsics = varargin{1};
% end
% Irgb  = undistortImage(Irgb, intrinsics);

% Detect ORB features
Igray  = im2gray(Irgb);

points = detectORBFeatures(Igray, ScaleFactor=scaleFactor, NumLevels=numLevels);

% Select a subset of features, uniformly distributed throughout the image
points = selectUniform(points, numPoints, size(Igray, 1:2));

% Extract features
[features, validPoints] = extractFeatures(Igray, points);
end

"helperComputeHomography " は、ホモグラフィを計算し、再構成を評価します。

function [H, score, inliersIndex] = helperComputeHomography(matchedPoints1, matchedPoints2)

[H, inliersLogicalIndex] = estgeotform2d( ...
    matchedPoints1, matchedPoints2, "projective", ...
    MaxNumTrials=1e3, MaxDistance=4);

inlierPoints1 = matchedPoints1(inliersLogicalIndex);
inlierPoints2 = matchedPoints2(inliersLogicalIndex);

inliersIndex  = find(inliersLogicalIndex);

locations1 = inlierPoints1.Location;
locations2 = inlierPoints2.Location;
xy1In2     = transformPointsForward(H, locations1);
xy2In1     = transformPointsInverse(H, locations2);
error1in2  = sum((locations2 - xy1In2).^2, 2);
error2in1  = sum((locations1 - xy2In1).^2, 2);

outlierThreshold = 6;

score = sum(max(outlierThreshold-error1in2, 0)) + ...
    sum(max(outlierThreshold-error2in1, 0));
end

"helperFundamentalMatrix " は、基礎行列を計算し、再構成を評価します。

function [F, score, inliersIndex] = helperComputeFundamentalMatrix(matchedPoints1, matchedPoints2, intrinsics)

[E, inliersLogicalIndex] = estimateEssentialMatrix( ...
    matchedPoints1, matchedPoints2, intrinsics, MaxNumTrials=1e3, MaxDistance=4);

F = intrinsics.K'\ E /intrinsics.K;

inlierPoints1 = matchedPoints1(inliersLogicalIndex);
inlierPoints2 = matchedPoints2(inliersLogicalIndex);

inliersIndex  = find(inliersLogicalIndex);

locations1    = inlierPoints1.Location;
locations2    = inlierPoints2.Location;

% Distance from points to epipolar line
lineIn1   = epipolarLine(F', locations2);
error2in1 = (sum([locations1, ones(size(locations1, 1),1)].* lineIn1, 2)).^2 ...
    ./ sum(lineIn1(:,1:2).^2, 2);
lineIn2   = epipolarLine(F, locations1);
error1in2 = (sum([locations2, ones(size(locations2, 1),1)].* lineIn2, 2)).^2 ...
    ./ sum(lineIn2(:,1:2).^2, 2);

outlierThreshold = 4;

score = sum(max(outlierThreshold-error1in2, 0)) + ...
    sum(max(outlierThreshold-error2in1, 0));

end

"helperTriangulateTwoFrames " は、2 つのフレームを三角形分割して、マップを初期化します。

function [isValid, xyzPoints, inlierIdx] = helperTriangulateTwoFrames(...
    pose1, pose2, matchedPoints1, matchedPoints2, intrinsics, minParallax)

camMatrix1 = cameraProjection(intrinsics, pose2extr(pose1));
camMatrix2 = cameraProjection(intrinsics, pose2extr(pose2));

[xyzPoints, reprojectionErrors, isInFront] = triangulate(matchedPoints1, ...
    matchedPoints2, camMatrix1, camMatrix2);

% Filter points by view direction and reprojection error
minReprojError = 1;
inlierIdx  = isInFront & reprojectionErrors < minReprojError;
xyzPoints  = xyzPoints(inlierIdx ,:);

% A good two-view with significant parallax
ray1       = xyzPoints - pose1.Translation;
ray2       = xyzPoints - pose2.Translation;
cosAngle   = sum(ray1 .* ray2, 2) ./ (vecnorm(ray1, 2, 2) .* vecnorm(ray2, 2, 2));

% Check parallax
isValid = all(cosAngle < cosd(minParallax) & cosAngle>0);
end

"helperUpdateGlobalMap" は、姿勢グラフの最適化後にマップ点の 3 次元位置を更新します。

function mapPointSet = helperUpdateGlobalMap(...
    mapPointSet, vSetKeyFrames, vSetKeyFramesOptim)
%helperUpdateGlobalMap update map points after pose graph optimization
posesOld     = vSetKeyFrames.Views.AbsolutePose;
posesNew     = vSetKeyFramesOptim.Views.AbsolutePose;
positionsOld = mapPointSet.WorldPoints;
positionsNew = positionsOld;
indices     = 1:mapPointSet.Count;

% Update world location of each map point based on the new absolute pose of 
% the corresponding major view
for i = indices
    majorViewIds = mapPointSet.RepresentativeViewId(i);
    poseNew = posesNew(majorViewIds).A;
    tform = affinetform3d(poseNew/posesOld(majorViewIds).A);
    positionsNew(i, :) = transformPointsForward(tform, positionsOld(i, :));
end
mapPointSet = updateWorldPoints(mapPointSet, indices, positionsNew);
end

参考文献

[1] Mur-Artal, Raul, Jose Maria Martinez Montiel, and Juan D. Tardos. "ORB-SLAM: a versatile and accurate monocular SLAM system." IEEE Transactions on Robotics 31, no. 5, pp 1147-116, 2015.

[2] Sturm, Jürgen, Nikolas Engelhard, Felix Endres, Wolfram Burgard, and Daniel Cremers. "A benchmark for the evaluation of RGB-D SLAM systems". In Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 573-580, 2012.

[3] Kümmerle, Rainer, Giorgio Grisetti, Hauke Strasdat, Kurt Konolige, and Wolfram Burgard. "g2o: A general framework for graph optimization." In Proceedings of IEEE International Conference on Robotics and Automation, pp. 3607-3613, 2011.

参考

トピック

ステレオ Visual Simultaneous Localization and Mapping