GPU Coder で最適化した車線検出

この例では次を使用します。

この例では、NVIDIA® GPU で実行される、深層学習を使用した車線検出アプリケーションを開発する方法を説明します。

事前学習済みの車線検出ネットワークは、AlexNet ネットワークをベースとしており、イメージに含まれる車線マーカーの境界を検出して出力することができます。AlexNet ネットワークの最後の数層が、規模の小さい全結合層と回帰出力層に置き換えられています。この例では、ホストマシンに搭載された CUDA 対応 GPU で実行される CUDA® 実行可能ファイルを生成します。

必要条件

CUDA 対応 NVIDIA GPU。
NVIDIA CUDA Toolkit およびドライバー。
NVIDIA cuDNN ライブラリ。
コンパイラおよびライブラリの環境変数。サポートされているコンパイラおよびライブラリのバージョンの詳細については、サードパーティハードウェアを参照してください。環境変数の設定は、前提条件となる製品の設定を参照してください。

GPU 環境の検証

関数coder.checkGpuInstallを使用して、この例を実行するのに必要なコンパイラおよびライブラリが正しく設定されていることを検証します。

envCfg = coder.gpuEnvConfig('host');
envCfg.DeepLibTarget = 'cudnn';
envCfg.DeepCodegen = 1;
envCfg.Quiet = 1;
coder.checkGpuInstall(envCfg);

事前学習済みの車線検出ネットワークの取得

この例では、事前学習済みの車線検出ネットワークを含む trainedLaneNet MAT ファイルを使用します。このファイルのサイズは約 143 MB です。MathWorks® の Web サイトからファイルをダウンロードします。

laneNetFile = matlab.internal.examples.downloadSupportFile('gpucoder/cnn_models/lane_detection', ...
    'trainedLaneNet.mat');

このネットワークは入力としてイメージを取り、エゴビークルの左右の車線に対応する 2 つの車線境界線を出力します。各車線境界線は、放物線方程式 $y = a x^{2} + b x + c$ によって表すことができます。ここで、y は横方向オフセット、x は車両からの縦方向の距離です。このネットワークは、車線ごとに 3 つのパラメーター a、b、c を出力します。ネットワークアーキテクチャは AlexNet に似ていますが、最後の数層は、規模の小さい全結合層と回帰出力層に置き換えられています。

load(laneNetFile);
disp(laneNet)

  SeriesNetwork with properties:

         Layers: [23×1 nnet.cnn.layer.Layer]
     InputNames: {'data'}
    OutputNames: {'output'}

ネットワークアーキテクチャを表示するには、関数 analyzeNetwork を使用します。

analyzeNetwork(laneNet)

テストビデオのダウンロード

モデルをテストするために、この例では、Caltech Lanes Dataset のビデオファイルを使用します。ファイルのサイズは約 8 MB です。MathWorks の Web サイトからファイルをダウンロードします。

videoFile = matlab.internal.examples.downloadSupportFile('gpucoder/media','caltech_cordova1.avi');

メインエントリポイント関数

detectLanesInVideo.m ファイルは、コード生成用のメインエントリポイント関数です。関数 detectLanesInVideo は、vision.VideoFileReader (Computer Vision Toolbox)System object を使用して入力ビデオからフレームを読み取り、LaneNet ネットワークオブジェクトの predict メソッドを呼び出して、検出された車線を入力ビデオ上に描画します。vision.DeployableVideoPlayer (Computer Vision Toolbox)System object は、検出された車線を含むビデオ出力を表示するために使用されます。

type detectLanesInVideo.m

function detectLanesInVideo(videoFile,net,laneCoeffMeans,laneCoeffsStds)
% detectLanesInVideo Entry-point function for the Lane Detection Optimized
% with GPU Coder example
%  
% detectLanesInVideo(videoFile,net,laneCoeffMeans,laneCoeffsStds) uses the
% VideoFileReader system object to read frames from the input video, calls
% the predict method of the LaneNet network object, and draws the detected
% lanes on the input video. A DeployableVideoPlayer system object is used
% to display the lane detected video output.

%   Copyright 2022 The MathWorks, Inc.

%#codegen

%% Create Video Reader and Video Player Object 
videoFReader   = vision.VideoFileReader(videoFile);
depVideoPlayer = vision.DeployableVideoPlayer(Name='Lane Detection on GPU');

%% Video Frame Processing Loop
while ~isDone(videoFReader)
    videoFrame = videoFReader();
    scaledFrame = 255.*(imresize(videoFrame,[227 227]));

    [laneFound,ltPts,rtPts] = laneNetPredict(net,scaledFrame, ...
        laneCoeffMeans,laneCoeffsStds);
    if(laneFound)
        pts = [reshape(ltPts',1,[]);reshape(rtPts',1,[])];
        videoFrame = insertShape(videoFrame, 'Line', pts, 'LineWidth', 4);
    end
    depVideoPlayer(videoFrame);
end
end

LaneNet 予測関数

関数 laneNetPredict は、1 つのビデオフレームに含まれる右車線と左車線の位置を計算します。この laneNet ネットワークは、左右の車線境界線の放物線方程式を記述するパラメーター a、b、c を計算します。これらのパラメーターから、車線の位置に対応する x 座標と y 座標を計算します。これらの座標をイメージ座標にマッピングしなければなりません。

type laneNetPredict.m

function [laneFound,ltPts,rtPts] = laneNetPredict(net,frame,means,stds) 
% laneNetPredict Predict lane markers on the input image frame using the
% lane detection network
%

%   Copyright 2017-2022 The MathWorks, Inc.

%#codegen

% A persistent object lanenet is used to load the network object. At the
% first call to this function, the persistent object is constructed and
% setup. When the function is called subsequent times, the same object is
% reused to call predict on inputs, thus avoiding reconstructing and
% reloading the network object.
persistent lanenet;
if isempty(lanenet)
    lanenet = coder.loadDeepLearningNetwork(net, 'lanenet');
end

lanecoeffsNetworkOutput = predict(lanenet,frame);

% Recover original coeffs by reversing the normalization steps.
params = lanecoeffsNetworkOutput .* stds + means;

% 'c' should be more than 0.5 for it to be a lane.
isRightLaneFound = abs(params(6)) > 0.5;
isLeftLaneFound =  abs(params(3)) > 0.5;

% From the networks output, compute left and right lane points in the image
% coordinates.
vehicleXPoints = 3:30;
ltPts = coder.nullcopy(zeros(28,2,'single'));
rtPts = coder.nullcopy(zeros(28,2,'single'));

if isRightLaneFound && isLeftLaneFound
    rtBoundary = params(4:6);
    rt_y = computeBoundaryModel(rtBoundary, vehicleXPoints);
    
    ltBoundary = params(1:3);
    lt_y = computeBoundaryModel(ltBoundary, vehicleXPoints);

    % Visualize lane boundaries of the ego vehicle.
    tform = get_tformToImage;

    % Map vehicle to image coordinates.
    ltPts =  tform.transformPointsInverse([vehicleXPoints', lt_y']);
    rtPts =  tform.transformPointsInverse([vehicleXPoints', rt_y']);
    laneFound = true;
else
    laneFound = false;
end
end

%% Helper Functions

% Compute boundary model.
function yWorld = computeBoundaryModel(model, xWorld)
yWorld = polyval(model, xWorld);
end

% Compute extrinsics.
function tform = get_tformToImage

%The camera coordinates are described by the caltech mono
% camera model.
yaw = 0;
pitch = 14; % Pitch of the camera in degrees
roll = 0;

translation = translationVector(yaw, pitch, roll);
rotation    = rotationMatrix(yaw, pitch, roll);

% Construct a camera matrix.
focalLength    = [309.4362, 344.2161];
principalPoint = [318.9034, 257.5352];
Skew = 0;

camMatrix = [rotation; translation] * intrinsicMatrix(focalLength, ...
    Skew, principalPoint);

% Turn camMatrix into 2-D homography.
tform2D = [camMatrix(1,:); camMatrix(2,:); camMatrix(4,:)]; % drop Z

tform = projective2d(tform2D);
tform = tform.invert();
end

% Translate to image co-ordinates.
function translation = translationVector(yaw, pitch, roll)
SensorLocation = [0 0];
Height = 2.1798;    % mounting height in meters from the ground
rotationMatrix = (...
    rotZ(yaw)*... % last rotation
    rotX(90-pitch)*...
    rotZ(roll)... % first rotation
    );


% Adjust for the SensorLocation by adding a translation.
sl = SensorLocation;

translationInWorldUnits = [sl(2), sl(1), Height];
translation = translationInWorldUnits*rotationMatrix;
end

% Rotation around X-axis.
function R = rotX(a)
a = deg2rad(a);
R = [...
    1   0        0;
    0   cos(a)  -sin(a);
    0   sin(a)   cos(a)];

end

% Rotation around Y-axis.
function R = rotY(a)
a = deg2rad(a);
R = [...
    cos(a)  0 sin(a);
    0       1 0;
    -sin(a) 0 cos(a)];

end

% Rotation around Z-axis.
function R = rotZ(a)
a = deg2rad(a);
R = [...
    cos(a) -sin(a) 0;
    sin(a)  cos(a) 0;
    0       0      1];
end

% Given the Yaw, Pitch, and Roll, determine the appropriate Euler angles
% and the sequence in which they are applied to align the camera's
% coordinate system with the vehicle coordinate system. The resulting
% matrix is a Rotation matrix that together with the Translation vector
% defines the extrinsic parameters of the camera.
function rotation = rotationMatrix(yaw, pitch, roll)
rotation = (...
    rotY(180)*...            % last rotation: point Z up
    rotZ(-90)*...            % X-Y swap
    rotZ(yaw)*...            % point the camera forward
    rotX(90-pitch)*...       % "un-pitch"
    rotZ(roll)...            % 1st rotation: "un-roll"
    );
end

% Intrinsic matrix computation.
function intrinsicMat = intrinsicMatrix(FocalLength, Skew, PrincipalPoint)
intrinsicMat = ...
    [FocalLength(1)  , 0                     , 0; ...
    Skew             , FocalLength(2)   , 0; ...
    PrincipalPoint(1), PrincipalPoint(2), 1];
end

CUDA 実行可能ファイルの生成

エントリポイント関数 detectLanesInVideo 用のスタンドアロン CUDA 実行可能ファイルを生成するには、'exe' ターゲットの GPU コード構成オブジェクトを作成し、ターゲット言語を C++ に設定します。関数 coder.DeepLearningConfig を使用して CuDNN 深層学習構成オブジェクトを作成し、それを GPU コード構成オブジェクトの DeepLearningConfig プロパティに割り当てます。

cfg = coder.gpuConfig('exe');
cfg.DeepLearningConfig = coder.DeepLearningConfig('cudnn');
cfg.GenerateReport = true;
cfg.GenerateExampleMain = "GenerateCodeAndCompile";
cfg.TargetLang = 'C++';
inputs = {coder.Constant(videoFile),coder.Constant(laneNetFile), ...
    coder.Constant(laneCoeffMeans),coder.Constant(laneCoeffsStds)};

codegen コマンドを実行します。

codegen -args inputs -config cfg detectLanesInVideo

Code generation successful: View report

生成されたコードの説明

系列ネットワークは、レイヤーフュージョンの最適化後に、18 個の層クラスから成る配列を含む C++ クラスとして生成されます。このクラスの setup() メソッドは、ハンドルを設定し、各層オブジェクトにメモリを割り当てます。predict() メソッドは、ネットワーク内の 18 個の層それぞれについて予測を呼び出します。

class lanenet0_0 {
public:
  lanenet0_0();
  void setSize();
  void resetState();
  void setup();
  void predict();
  void cleanup();
  float *getLayerOutput(int layerIndex, int portIndex);
  int getLayerOutputSize(int layerIndex, int portIndex);
  float *getInputDataPointer(int b_index);
  float *getInputDataPointer();
  float *getOutputDataPointer(int b_index);
  float *getOutputDataPointer();
  int getBatchSize();
  ~lanenet0_0();

private:
  void allocate();
  void postsetup();
  void deallocate();

public:
  boolean_T isInitialized;
  boolean_T matlabCodegenIsDeleted;

private:
  int numLayers;
  MWTensorBase *inputTensors[1];
  MWTensorBase *outputTensors[1];
  MWCNNLayer *layers[18];
  MWCudnnTarget::MWTargetNetworkImpl *targetImpl;
};

cnn_lanenet*_conv*_w ファイルと cnn_lanenet*_conv*_b ファイルは、ネットワークの畳み込み層のバイナリ重みとバイアスのファイルです。cnn_lanenet*_fc*_w ファイルと cnn_lanenet*_fc*_b ファイルは、ネットワーク内の全結合層のバイナリ重みとバイアスのファイルです。

codegendir = fullfile('codegen', 'exe', 'detectLanesInVideo');
dir([codegendir,filesep,'*.bin'])

cnn_lanenet0_0_conv1_b.bin        cnn_lanenet0_0_conv3_b.bin        cnn_lanenet0_0_conv5_b.bin        cnn_lanenet0_0_fc6_b.bin          cnn_lanenet0_0_fcLane2_b.bin      
cnn_lanenet0_0_conv1_w.bin        cnn_lanenet0_0_conv3_w.bin        cnn_lanenet0_0_conv5_w.bin        cnn_lanenet0_0_fc6_w.bin          cnn_lanenet0_0_fcLane2_w.bin      
cnn_lanenet0_0_conv2_b.bin        cnn_lanenet0_0_conv4_b.bin        cnn_lanenet0_0_data_offset.bin    cnn_lanenet0_0_fcLane1_b.bin      networkParamsInfo_lanenet0_0.bin  
cnn_lanenet0_0_conv2_w.bin        cnn_lanenet0_0_conv4_w.bin        cnn_lanenet0_0_data_scale.bin     cnn_lanenet0_0_fcLane1_w.bin

実行可能ファイルの実行

実行可能ファイルを実行するには、以下のコード行のコメントを解除します。

if ispc
    [status,cmdout] = system("detectLanesInVideo.exe");
else
    [status,cmdout] = system("./detectLanesInVideo");
end

参考

関数

codegen | coder.DeepLearningConfig | coder.loadDeepLearningNetwork | coder.checkGpuInstall

オブジェクト

coder.gpuConfig | coder.gpuEnvConfig | coder.CuDNNConfig | coder.TensorRTConfig