Intel ターゲットにおける異なるバッチサイズの深層学習コードの生成

この例では次を使用します。

この例では、Intel® プロセッサで深層学習を使用するイメージ分類アプリケーションのコードを生成するための codegen コマンドの使用方法を示します。生成されたコードは Intel Math Kernel Library for Deep Neural Networks (MKL-DNN) を使用します。この例は次の 2 つの部分で構成されます。

最初の部分は、イメージのバッチを入力として受け入れる MEX 関数を生成する方法を示します。
2 番目の部分は、イメージのバッチを入力として受け入れる実行可能ファイルを生成する方法を示します。

前提条件

Intel Advanced Vector Extensions 2 (Intel AVX2) 手順をサポートする Intel プロセッサ
Intel Math Kernel Library for Deep Neural Networks (MKL-DNN)
コンパイラおよびライブラリの環境変数。サポートされるコンパイラのバージョンの詳細については、サポートされるコンパイラを参照してください。環境変数の設定については、深層学習に MATLAB Coder を使用するための前提条件を参照してください。

この例は、Linux®、Windows® および Mac® プラットフォームでサポートされており、MATLAB Online ではサポートされていません。

入力ビデオファイルのダウンロード

サンプルビデオファイルをダウンロードします。

   if ~exist('./object_class.avi', 'file')
       url = 'https://www.mathworks.com/supportfiles/gpucoder/media/object_class.avi.zip';
       websave('object_class.avi.zip',url);
       unzip('object_class.avi.zip');
   end

関数 `resnet_predict` の定義

この例では、DAG ネットワーク ResNet-50 を使用して、Intel デスクトップでのイメージ分類を示します。MATLAB 用の事前学習済みの ResNet-50 モデルは、サポートパッケージ Deep Learning Toolbox Model for ResNet-50 Network に含まれています。

関数 resnet_predict は、ResNet-50 ネットワークを永続的なネットワークオブジェクトに読み込み、入力に対する予測を実行します。後続の関数の呼び出しでは、永続的なネットワークオブジェクトが再利用されます。

type resnet_predict

% Copyright 2020 The MathWorks, Inc.

function out = resnet_predict(in) 
%#codegen

% A persistent object mynet is used to load the series network object. At
% the first call to this function, the persistent object is constructed and
% setup. When the function is called subsequent times, the same object is
% reused to call predict on inputs, avoiding reconstructing and reloading
% the network object.

persistent mynet;

if isempty(mynet)
    % Call the function resnet50 that returns a DAG network
    % for ResNet-50 model.
    mynet = coder.loadDeepLearningNetwork('resnet50','resnet');
end

% pass in input   
out = mynet.predict(in);

`resnet_predict` の MEX の生成

関数 resnet_predict の MEX 関数を生成するには、MKL-DNN ライブラリ用の深層学習構成オブジェクトと共に codegen を使用します。深層学習構成オブジェクトを codegen に渡す MEX コード生成構成オブジェクトに添付します。codegen コマンドを実行して、サイズが [224,224,3,|batchSize|] の 4 次元行列として入力を指定します。この値は、ResNet-50 ネットワークの入力層サイズに対応します。

    batchSize = 5;
    cfg = coder.config('mex');
    cfg.TargetLang = 'C++';
    cfg.DeepLearningConfig = coder.DeepLearningConfig('mkldnn');
    codegen -config cfg resnet_predict -args {ones(224,224,3,batchSize,'single')} -report

Code generation successful: To view the report, open('codegen\mex\resnet_predict\html\report.mldatx')

イメージのバッチに対する予測の実行

Object_class.avi ビデオファイルが既にダウンロードされていることを前提とします。videoReader オブジェクトを作成し、読み取り関数 videoReader を使用して 5 フレームを読み取ります。batchSize が 5 に設定されているため、5 つのイメージを読み取ります。入力イメージのバッチサイズを、resnet50 で必要とされるサイズ (ResNet50 ネットワークで想定されるサイズ) に変更します。

   videoReader = VideoReader('Object_class.avi');
   imBatch = read(videoReader,[1 5]);
   imBatch = imresize(imBatch, [224,224]);

指定された入力の分類結果を出力する、生成された関数 resnet_predict_mex を呼び出します。

   predict_scores = resnet_predict_mex(single(imBatch));

バッチ内の各イメージについて、上位 5 つの確率スコアとそれらのラベルを取得します。

   [val,indx] = sort(transpose(predict_scores), 'descend');
   scores = val(1:5,:)*100;
   net = resnet50;
   classnames = net.Layers(end).ClassNames;
   for i = 1:batchSize
       labels = classnames(indx(1:5,i));
       disp(['Top 5 predictions on image, ', num2str(i)]);
       for j=1:5
           disp([labels{j},' ',num2str(scores(j,i), '%2.2f'),'%'])
       end
   end

最初のイメージに対する予測について、上位 5 つの予測スコアを synset ディクショナリのワードにマッピングします。

   fid = fopen('synsetWords.txt');
   synsetOut = textscan(fid,'%s', 'delimiter', '\n');
   synsetOut = synsetOut{1};
   fclose(fid);
   [val,indx] = sort(transpose(predict_scores), 'descend');
   scores = val(1:5,1)*100;
   top5labels = synsetOut(indx(1:5,1));

上位 5 つの分類ラベルをイメージに表示します。

   outputImage = zeros(224,400,3, 'uint8');
   for k = 1:3
       outputImage(:,177:end,k) = imBatch(:,:,k,1);
   end

   scol = 1;
   srow = 1;
   outputImage = insertText(outputImage, [scol, srow], 'Classification with ResNet-50', 'TextColor', 'w','FontSize',20, 'BoxColor', 'black');
   srow = srow + 30;
   for k = 1:5
       outputImage = insertText(outputImage, [scol, srow], [top5labels{k},' ',num2str(scores(k), '%2.2f'),'%'], 'TextColor', 'w','FontSize',15, 'BoxColor', 'black');
       srow = srow + 25;
   end

   imshow(outputImage);

永続的なネットワークオブジェクトをメモリからクリアします。

clear mex;

エントリポイント関数 `resnet_predict_exe` の定義

MATLAB コードから実行可能ファイルを生成するには、新しいエントリポイント関数 resnet_predict_exe を定義します。この関数は前のエントリポイント関数 resent_predict に似ていますが、この関数にはさらに前処理と後処理を行うコードが含まれます。resnet_predict_exe で使用される API はプラットフォームに依存します。この関数は、入力引数としてビデオとバッチサイズを受け入れます。これらの引数はコンパイル時の定数です。

type resnet_predict_exe

% Copyright 2020 The MathWorks, Inc.

function resnet_predict_exe(inputVideo,batchSize) 
%#codegen

    % A persistent object mynet is used to load the series network object.
    % At the first call to this function, the persistent object is constructed and
    % setup. When the function is called subsequent times, the same object is reused 
    % to call predict on inputs, avoiding reconstructing and reloading the
    % network object.
    persistent mynet;

    if isempty(mynet)
        % Call the function resnet50 that returns a DAG network
        % for ResNet-50 model.
        mynet = coder.loadDeepLearningNetwork('resnet50','resnet');
    end

    % Create video reader and video player objects %
    videoReader = VideoReader(inputVideo);
    depVideoPlayer = vision.DeployableVideoPlayer;


    % Read the classification label names %
    synsetOut = readImageClassLabels('synsetWords.txt');

    i=1;
    % Read frames until end of video file %
    while ~(i+batchSize > (videoReader.NumFrames+1))
        % Read and resize batch of frames as specified by input argument%
        reSizedImagesBatch = readImageInputBatch(videoReader,batchSize,i);

        % run predict on resized input images %
        predict_scores = mynet.predict(reSizedImagesBatch);


        % overlay the prediction scores on images and display %
        overlayResultsOnImages(predict_scores,synsetOut,reSizedImagesBatch,batchSize,depVideoPlayer)

        i = i+ batchSize; 
    end
    release(depVideoPlayer);
end

function synsetOut = readImageClassLabels(classLabelsFile)
% Read the classification label names from the file 
%
% Inputs : 
% classLabelsFile - supplied by user
%
% Outputs : 
% synsetOut       - cell array filled with 1000 image class labels

    synsetOut = cell(1000,1);
    fid = fopen(classLabelsFile);
    for i = 1:1000
        synsetOut{i} = fgetl(fid);
    end
    fclose(fid);
end

function reSizedImagesBatch = readImageInputBatch(videoReader,batchSize,i)
% Read and resize batch of frames as specified by input argument%
%
% Inputs : 
% videoReader - Object used for reading the images from video file
% batchSize   - Number of images in batch to process. Supplied by user
% i           - index to track frames read from video file
%
% Outputs : 
% reSizedImagesBatch - Batch of images resized to 224x224x3xbatchsize

    img = read(videoReader,[i (i+batchSize-1)]);
    reSizedImagesBatch = coder.nullcopy(ones(224,224,3,batchSize,'like',img));
    resizeTo  = coder.const([224,224]);
    reSizedImagesBatch(:,:,:,:) = imresize(img,resizeTo);
end


function overlayResultsOnImages(predict_scores,synsetOut,reSizedImagesBatch,batchSize,depVideoPlayer)
% Read and resize batch of frames as specified by input argument%
%
% Inputs : 
% predict_scores  - classification results for given network
% synsetOut       - cell array filled with 1000 image class labels
% reSizedImagesBatch - Batch of images resized to 224x224x3xbatchsize
% batchSize       - Number of images in batch to process. Supplied by user
% depVideoPlayer  - Object for displaying results
%
% Outputs : 
% Predicted results overlayed on input images

    % sort the predicted scores  %
    [val,indx] = sort(transpose(predict_scores), 'descend');

    for j = 1:batchSize
        scores = val(1:5,j)*100;
        outputImage = zeros(224,400,3, 'uint8');
        for k = 1:3
            outputImage(:,177:end,k) = reSizedImagesBatch(:,:,k,j);
        end

        % Overlay the results on image %
        scol = 1;
        srow = 1;
        outputImage = insertText(outputImage, [scol, srow], 'Classification with ResNet-50', 'TextColor', [255 255 255],'FontSize',20, 'BoxColor', [0 0 0]);
        srow = srow + 30;
        for k = 1:5
            scoreStr = sprintf('%2.2f',scores(k));
            outputImage = insertText(outputImage, [scol, srow], [synsetOut{indx(k,j)},' ',scoreStr,'%'], 'TextColor', [255 255 255],'FontSize',15, 'BoxColor', [0 0 0]);
            srow = srow + 25;
        end
    
        depVideoPlayer(outputImage);
    end
end

関数 `resnet_predict_exe` の構造

関数 resnet_predict_exe には、次のアクションを実行する 4 つのサブセクションが含まれています。

指定された入力テキストファイルから分類ラベルを読み取る
イメージの入力バッチを読み取り、ネットワークでの必要に応じてそのサイズを変更する
入力イメージバッチに対する推論を実行する
結果をイメージに重ねて表示する

これらの各手順の詳細については、以降の節を参照してください。

関数 `readImageClassLabels`

この関数は、入力引数として synsetWords.txt ファイルを受け入れます。分類ラベルを読み取り、cell 配列に入力します。

       function synsetOut = readImageClassLabels(classLabelsFile)
       % Read the classification label names from the file
       %
       % Inputs :
       % classLabelsFile - supplied by user
       %
       % Outputs :
       % synsetOut       - cell array filled with 1000 image class labels

           synsetOut = cell(1000,1);
           fid = fopen(classLabelsFile);
           for i = 1:1000
               synsetOut{i} = fgetl(fid);
           end
           fclose(fid);
       end

関数 `readImageInputBatch`

この関数は、関数に入力引数として渡されるビデオ入力ファイルからイメージを読み取り、そのサイズを変更します。指定された入力イメージを読み取り、そのサイズを、resnet50 ネットワークで想定されるサイズである 224x224x3 に変更します。

       function reSizedImagesBatch = readImageInputBatch(videoReader,batchSize,i)
       % Read and resize batch of frames as specified by input argument%
       %
       % Inputs :
       % videoReader - Object used for reading the images from video file
       % batchSize   - Number of images in batch to process. Supplied by user
       % i           - index to track frames read from video file
       %
       % Outputs :
       % reSizedImagesBatch - Batch of images resized to 224x224x3xbatchsize

           img = read(videoReader,[i (i+batchSize-1)]);
           reSizedImagesBatch = coder.nullcopy(ones(224,224,3,batchSize,'like',img));
           resizeTo  = coder.const([224,224]);
           reSizedImagesBatch(:,:,:,:) = imresize(img,resizeTo);
       end

関数 `mynet.predict`

この関数は、サイズ変更されたイメージのバッチを入力として受け入れ、予測結果を返します。

      % run predict on resized input images %
      predict_scores = mynet.predict(reSizedImagesBatch);

関数 `overlayResultsOnImages`

この関数は、予測結果を受け入れて降順に並べ替えます。これらの結果を入力イメージに重ねて表示します。

       function overlayResultsOnImages(predict_scores,synsetOut,reSizedImagesBatch,batchSize,depVideoPlayer)
       % Read and resize batch of frames as specified by input argument%
       %
       % Inputs :
       % predict_scores  - classification results for given network
       % synsetOut       - cell array filled with 1000 image class labels
       % reSizedImagesBatch - Batch of images resized to 224x224x3xbatchsize
       % batchSize       - Number of images in batch to process. Supplied by user
       % depVideoPlayer  - Object for displaying results
       %
       % Outputs :
       % Predicted results overlayed on input images

           % sort the predicted scores  %
           [val,indx] = sort(transpose(predict_scores), 'descend');

           for j = 1:batchSize
               scores = val(1:5,j)*100;
               outputImage = zeros(224,400,3, 'uint8');
               for k = 1:3
                   outputImage(:,177:end,k) = reSizedImagesBatch(:,:,k,j);
               end

               % Overlay the results on image %
               scol = 1;
               srow = 1;
               outputImage = insertText(outputImage, [scol, srow], 'Classification with ResNet-50', 'TextColor', [255 255 255],'FontSize',20, 'BoxColor', [0 0 0]);
               srow = srow + 30;
               for k = 1:5
                   scoreStr = sprintf('%2.2f',scores(k));
                   outputImage = insertText(outputImage, [scol, srow], [synsetOut{indx(k,j)},' ',scoreStr,'%'], 'TextColor', [255 255 255],'FontSize',15, 'BoxColor', [0 0 0]);
                   srow = srow + 25;
               end

               depVideoPlayer(outputImage);
           end
       end

実行可能ファイルのビルドと実行

実行可能ファイルの生成用にコード構成オブジェクトを作成します。それに深層学習構成オブジェクトを添付します。変数 batchSize と変数 inputVideoFile を設定します。

カスタム C++ main 関数を作成する代わりに、生成された C++ main の例を使用する場合は、GenerateExampleMain パラメーターを 'GenerateCodeAndCompile' に設定します。また、cfg.EnableOpenMP を無効にして、デスクトップターミナルから実行可能ファイルを実行する際に openmp ライブラリの依存関係がないようにします。

       cfg = coder.config('exe');
       cfg.TargetLang = 'C++';
       cfg.DeepLearningConfig = coder.DeepLearningConfig('mkldnn');
       batchSize = 5;
       inputVideoFile = 'object_class.avi';
       cfg.GenerateExampleMain = 'GenerateCodeAndCompile';
       cfg.EnableOpenMP = 0;

codegen コマンドを実行して実行可能ファイルをビルドします。生成された実行可能ファイル resnet_predict_exe を MATLAB コマンドラインまたはデスクトップターミナルで実行します。

       codegen -config cfg resnet_predict_exe -args {coder.Constant(inputVideoFile), coder.Constant(batchSize)} -report
       system('./resnet_predict_exe')

参考

codegen | coder.DeepLearningConfig | coder.MklDNNConfig | coder.loadDeepLearningNetwork

Intel ターゲットにおける異なるバッチ サイズの深層学習コードの生成

前提条件

入力ビデオ ファイルのダウンロード

関数 resnet_predict の定義

resnet_predict の MEX の生成

イメージのバッチに対する予測の実行

エントリポイント関数 resnet_predict_exe の定義

関数 resnet_predict_exe の構造

関数 readImageClassLabels

関数 readImageInputBatch

関数 mynet.predict

関数 overlayResultsOnImages