Why is the CNN predict function faster when running a set of images from ImageDataStore compared to running each image individually?

Question

Eric Louchard 2021 年 7 月 17 日

0
リンク

この質問への直接リンク

https://jp.mathworks.com/matlabcentral/answers/880808-why-is-the-cnn-predict-function-faster-when-running-a-set-of-images-from-imagedatastore-compared-to

回答済み: Eric Louchard 2021 年 7 月 21 日

I am trying this example code

Create Simple Deep Learning Network for Classification - MATLAB & Simulink Example (mathworks.com)

One thing I notice is that running the Classify function with the imageDataStore (imdsValidation)is much faster than running one image multiple times. Is this some sort of batch process that is using Matlab vectorization to speed things up or is it inherent to a CNN?

YPred = classify(net,imdsValidation);

And a related question, when doing codegen or CNNcodegen, is the resulting C++ code able to also run multiple images like this? I cannot see a way to do it with the C++ output code.

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

サインインしてコメントする。

サインインしてこの質問に回答する。

Answer 1

Vineet Joshi 2021 年 7 月 20 日

0
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/880808-why-is-the-cnn-predict-function-faster-when-running-a-set-of-images-from-imagedatastore-compared-to#answer_750258

Hi

As you can see in the documentation page classify - MiniBatchSize, larger mini-batches can result in faster prediction but requires more memory and hence it is not something specific to the CNN network.

As for your second question, the CNNcodegen function only generates the codes for the network, how you inference it depends on your choice. You can write the code to sequencially inference the network and get the C++ code, or use other techniques like multiple workers and parallel computing to make it faster in a batch setting.

Hope this was helpful.

Thanks

1 件のコメント
-1 件の古いコメントを表示-1 件の古いコメントを非表示

Eric Louchard 2021 年 7 月 20 日

Thanks for the reply! I have been doing some more experiments and found that making a 4D array of images takes advantage of some parallel processing system in Matlab. I have another question once I describe the findings.

~~~~~~~~~~~~~~

I made a 64-deep block of images into a 4D datacube and compared running it verses running one image 64 times and the results were the 4D datacube process was around 10x faster.

tic

for loop = 1:64

score = Fastnet_LWIR.predict(imLWIR);

end

toc

tic

score = Fastnet_LWIR.predict(im4D);

toc

Elapsed time is 0.212989 seconds.

Elapsed time is 0.023708 seconds.

So, knowing this, I tried to make a mex file using codegen and a simple prediction function and used args {ones(64,64,1,64,'uint8')} for a 64 deep 4D datacube.

cfg = coder.gpuConfig('mex');

cfg.TargetLang = 'C++';

cfg.DeepLearningConfig = coder.DeepLearningConfig('cudnn');

codegen -config cfg Fastnet_LWIR_predict -args {ones(64,64,1,64,'uint8')} -report

This resulted in a mex file that took in the 4D datacube as input only, not a single frame, but had the same type of speed improvement.

However, in trying to make a C++ dll or lib, I kept getting errors so I tried cnncodegen instead and it worked, but I think it is only a single call to predict and not doing any sort of parallel processing. I tried 'batchsize', 64 in the call to cnncodegen below.

cnncodegen(Fastnet_LWIR,'targetlib','cudnn','ComputeCapability','6.1','targetparams',struct('AutoTuning',true,'DataType','FP32'),'batchsize',64,'codegenonly',1)

~~~~~~~~~~~~~~~~~ now to the question

Is there a way to call cnncodegen to write C++ code and have it work on 4D datacubes, taking advantage of prallel processing?

This is the error I got when trying to use codegen and 'lib'

cfg = coder.gpuConfig('lib');

cfg.TargetLang = 'C++';

cfg.DeepLearningConfig = coder.DeepLearningConfig('cudnn');

codegen -config cfg Fastnet_LWIR_predict -args {ones(64,64,1,64,'uint8')} -report

**********************************************************************

** Visual Studio 2017 Developer Command Prompt v15.5.0

**********************************************************************

[vcvarsall.bat] Environment initialized for: 'x64'

Microsoft (R) Program Maintenance Utility Version 14.12.25830.2

nvcc -c -Xcompiler "/wd 4819" -Xcompiler "/MD" -rdc=true -Xcudafe "--display_error_number --diag_suppress=unsigned_compare_with_zero" -O3 -arch sm_35 -D MW_CUDA_ARCH=350 -D BUILDING_TEST_CNN_PREDICTOR -D MODEL=test_CNN_predictor -D MODEL=test_CNN_predictor -o "MWElementwiseAffineLayer.obj" "D:\Fastnet_LWIR\codegen\dll\test_CNN_predictor\MWElementwiseAffineLayer.cpp"

MWElementwiseAffineLayer.cpp

C:\EngTools\Microsoft Visual Studio\2017\Professional\VC\Tools\MSVC\14.12.25827\include\crtdefs.h(10): fatal error C1083: Cannot open include file: 'corecrt.h': No such file or directory

NMAKE : fatal error U1077: '"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1\bin\nvcc.EXE"' : return code '0x2'

Stop.

The make command returned an error of 2

Error(s) encountered while building "test_CNN_predictor":

### Failed to generate all binary outputs.

------------------------------------------------------------------------

??? Build error: C++ compiler produced errors. See the Build Log for further details.

More information

サインインしてコメントする。

Answer 2

Eric Louchard 2021 年 7 月 21 日

0
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/880808-why-is-the-cnn-predict-function-faster-when-running-a-set-of-images-from-imagedatastore-compared-to#answer_750818

When I test codegen, I now get this

clear cfg

cfg = coder.gpuConfig('lib');

cfg.TargetLang = 'C++';

cfg.DeepLearningConfig = coder.DeepLearningConfig('cudnn');

codegen -config cfg Fastnet_LWIR_predict -args {ones(64,64,1,'uint8')} -report

Warning: Validation warning(s):

The following macro(s) in the build configuration options were not found in both the declared list of toolchain macros and

standard code-generation macros:

conlibs

If the above macro(s) is not defined at the point when the makefile is invoked, the build may fail.

> In coder.make.ToolchainInfo/validate

In coder.make.invokeBuilder

In RTW.genMakefileAndBuild (line 458)

In coder.internal.doCompile

In emcBuildRTW

In emcGenMakefileAndBuild

In emcBuildTarget

In emlcprivate

In coder.internal.compile

In emlckernel

In emlcprivate

In codegen

------------------------------------------------------------------------

**********************************************************************

** Visual Studio 2017 Developer Command Prompt v15.5.0

**********************************************************************

[vcvarsall.bat] Environment initialized for: 'x64'

Microsoft (R) Program Maintenance Utility Version 14.12.25830.2

nvcc -c -Xcompiler "/wd 4819" -Xcompiler "/MD" -rdc=true -Xcudafe "--display_error_number --diag_suppress=unsigned_compare_with_zero" -O3 -arch sm_35 -D MW_CUDA_ARCH=350 -D MODEL=Fastnet_LWIR_predict -D MODEL=Fastnet_LWIR_predict -o "MWElementwiseAffineLayer.obj" "D:\Fastnet_LWIR\codegen\lib\Fastnet_LWIR_predict\MWElementwiseAffineLayer.cpp"

MWElementwiseAffineLayer.cpp

nvcc -c -Xcompiler "/wd 4819" -Xcompiler "/MD" -rdc=true -Xcudafe "--display_error_number --diag_suppress=unsigned_compare_with_zero" -O3 -arch sm_35 -D MW_CUDA_ARCH=350 -D MODEL=Fastnet_LWIR_predict -D MODEL=Fastnet_LWIR_predict -o "MWFusedConvReLULayer.obj" "D:\Fastnet_LWIR\codegen\lib\Fastnet_LWIR_predict\MWFusedConvReLULayer.cpp"

MWFusedConvReLULayer.cpp

nvcc -c -Xcompiler "/wd 4819" -Xcompiler "/MD" -rdc=true -Xcudafe "--display_error_number --diag_suppress=unsigned_compare_with_zero" -O3 -arch sm_35 -D MW_CUDA_ARCH=350 -D MODEL=Fastnet_LWIR_predict -D MODEL=Fastnet_LWIR_predict -o "cnn_api.obj" "D:\Fastnet_LWIR\codegen\lib\Fastnet_LWIR_predict\cnn_api.cpp"

cnn_api.cpp

nvcc -c -Xcompiler "/wd 4819" -Xcompiler "/MD" -rdc=true -Xcudafe "--display_error_number --diag_suppress=unsigned_compare_with_zero" -O3 -arch sm_35 -D MW_CUDA_ARCH=350 -D MODEL=Fastnet_LWIR_predict -D MODEL=Fastnet_LWIR_predict -o "MWCNNLayerImpl.obj" "D:\Fastnet_LWIR\codegen\lib\Fastnet_LWIR_predict\MWCNNLayerImpl.cu"

MWCNNLayerImpl.cu

nvcc -c -Xcompiler "/wd 4819" -Xcompiler "/MD" -rdc=true -Xcudafe "--display_error_number --diag_suppress=unsigned_compare_with_zero" -O3 -arch sm_35 -D MW_CUDA_ARCH=350 -D MODEL=Fastnet_LWIR_predict -D MODEL=Fastnet_LWIR_predict -o "MWElementwiseAffineLayerImpl.obj" "D:\Fastnet_LWIR\codegen\lib\Fastnet_LWIR_predict\MWElementwiseAffineLayerImpl.cu"

MWElementwiseAffineLayerImpl.cu

nvcc -c -Xcompiler "/wd 4819" -Xcompiler "/MD" -rdc=true -Xcudafe "--display_error_number --diag_suppress=unsigned_compare_with_zero" -O3 -arch sm_35 -D MW_CUDA_ARCH=350 -D MODEL=Fastnet_LWIR_predict -D MODEL=Fastnet_LWIR_predict -o "MWElementwiseAffineLayerImplKernel.obj" "D:\Fastnet_LWIR\codegen\lib\Fastnet_LWIR_predict\MWElementwiseAffineLayerImplKernel.cu"

MWElementwiseAffineLayerImplKernel.cu

nvcc -c -Xcompiler "/wd 4819" -Xcompiler "/MD" -rdc=true -Xcudafe "--display_error_number --diag_suppress=unsigned_compare_with_zero" -O3 -arch sm_35 -D MW_CUDA_ARCH=350 -D MODEL=Fastnet_LWIR_predict -D MODEL=Fastnet_LWIR_predict -o "MWFusedConvReLULayerImpl.obj" "D:\Fastnet_LWIR\codegen\lib\Fastnet_LWIR_predict\MWFusedConvReLULayerImpl.cu"

MWFusedConvReLULayerImpl.cu

nvcc -c -Xcompiler "/wd 4819" -Xcompiler "/MD" -rdc=true -Xcudafe "--display_error_number --diag_suppress=unsigned_compare_with_zero" -O3 -arch sm_35 -D MW_CUDA_ARCH=350 -D MODEL=Fastnet_LWIR_predict -D MODEL=Fastnet_LWIR_predict -o "MWTargetNetworkImpl.obj" "D:\Fastnet_LWIR\codegen\lib\Fastnet_LWIR_predict\MWTargetNetworkImpl.cu"

MWTargetNetworkImpl.cu

nvcc -c -Xcompiler "/wd 4819" -Xcompiler "/MD" -rdc=true -Xcudafe "--display_error_number --diag_suppress=unsigned_compare_with_zero" -O3 -arch sm_35 -D MW_CUDA_ARCH=350 -D MODEL=Fastnet_LWIR_predict -D MODEL=Fastnet_LWIR_predict -o "Fastnet_LWIR_predict_rtwutil.obj" D:\Fastnet_LWIR\codegen\lib\Fastnet_LWIR_predict\Fastnet_LWIR_predict_rtwutil.cu

Fastnet_LWIR_predict_rtwutil.cu

nvcc -c -Xcompiler "/wd 4819" -Xcompiler "/MD" -rdc=true -Xcudafe "--display_error_number --diag_suppress=unsigned_compare_with_zero" -O3 -arch sm_35 -D MW_CUDA_ARCH=350 -D MODEL=Fastnet_LWIR_predict -D MODEL=Fastnet_LWIR_predict -o "Fastnet_LWIR_predict_data.obj" D:\Fastnet_LWIR\codegen\lib\Fastnet_LWIR_predict\Fastnet_LWIR_predict_data.cu

Fastnet_LWIR_predict_data.cu

nvcc -c -Xcompiler "/wd 4819" -Xcompiler "/MD" -rdc=true -Xcudafe "--display_error_number --diag_suppress=unsigned_compare_with_zero" -O3 -arch sm_35 -D MW_CUDA_ARCH=350 -D MODEL=Fastnet_LWIR_predict -D MODEL=Fastnet_LWIR_predict -o "Fastnet_LWIR_predict_initialize.obj" D:\Fastnet_LWIR\codegen\lib\Fastnet_LWIR_predict\Fastnet_LWIR_predict_initialize.cu

Fastnet_LWIR_predict_initialize.cu

nvcc -c -Xcompiler "/wd 4819" -Xcompiler "/MD" -rdc=true -Xcudafe "--display_error_number --diag_suppress=unsigned_compare_with_zero" -O3 -arch sm_35 -D MW_CUDA_ARCH=350 -D MODEL=Fastnet_LWIR_predict -D MODEL=Fastnet_LWIR_predict -o "Fastnet_LWIR_predict_terminate.obj" D:\Fastnet_LWIR\codegen\lib\Fastnet_LWIR_predict\Fastnet_LWIR_predict_terminate.cu

Fastnet_LWIR_predict_terminate.cu

nvcc -c -Xcompiler "/wd 4819" -Xcompiler "/MD" -rdc=true -Xcudafe "--display_error_number --diag_suppress=unsigned_compare_with_zero" -O3 -arch sm_35 -D MW_CUDA_ARCH=350 -D MODEL=Fastnet_LWIR_predict -D MODEL=Fastnet_LWIR_predict -o "Fastnet_LWIR_predict.obj" D:\Fastnet_LWIR\codegen\lib\Fastnet_LWIR_predict\Fastnet_LWIR_predict.cu

Fastnet_LWIR_predict.cu

nvcc -c -Xcompiler "/wd 4819" -Xcompiler "/MD" -rdc=true -Xcudafe "--display_error_number --diag_suppress=unsigned_compare_with_zero" -O3 -arch sm_35 -D MW_CUDA_ARCH=350 -D MODEL=Fastnet_LWIR_predict -D MODEL=Fastnet_LWIR_predict -o "DeepLearningNetwork.obj" D:\Fastnet_LWIR\codegen\lib\Fastnet_LWIR_predict\DeepLearningNetwork.cu

DeepLearningNetwork.cu

nvcc -c -Xcompiler "/wd 4819" -Xcompiler "/MD" -rdc=true -Xcudafe "--display_error_number --diag_suppress=unsigned_compare_with_zero" -O3 -arch sm_35 -D MW_CUDA_ARCH=350 -D MODEL=Fastnet_LWIR_predict -D MODEL=Fastnet_LWIR_predict -o "predict.obj" D:\Fastnet_LWIR\codegen\lib\Fastnet_LWIR_predict\predict.cu

predict.cu

nvcc -c -Xcompiler "/wd 4819" -Xcompiler "/MD" -rdc=true -Xcudafe "--display_error_number --diag_suppress=unsigned_compare_with_zero" -O3 -arch sm_35 -D MW_CUDA_ARCH=350 -D MODEL=Fastnet_LWIR_predict -D MODEL=Fastnet_LWIR_predict -o "MWCudaDimUtility.obj" "D:\Fastnet_LWIR\codegen\lib\Fastnet_LWIR_predict\MWCudaDimUtility.cu"

MWCudaDimUtility.cu

'cmd' is not recognized as an internal or external command,

operable program or batch file.

NMAKE : fatal error U1077: 'cmd' : return code '0x1'

Stop.

The make command returned an error of 2

Error(s) encountered while building "Fastnet_LWIR_predict":

### Failed to generate all binary outputs.

------------------------------------------------------------------------

??? Build error: C++ compiler produced errors. See the Build Log for further details.

More information

Code generation failed: View Error Report

Error using codegen

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

サインインしてコメントする。

Why is the CNN predict function faster when running a set of images from ImageDataStore compared to running each image individually?

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

回答 (2 件)

1 件のコメント
-1 件の古いコメントを表示-1 件の古いコメントを非表示

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

参考

カテゴリ

タグ

製品

リリース

Community Treasure Hunt

Why is the CNN predict function faster when running a set of images from ImageDataStore compared to running each image individually?

0 件のコメント -2 件の古いコメントを表示-2 件の古いコメントを非表示

回答 (2 件)

1 件のコメント -1 件の古いコメントを表示-1 件の古いコメントを非表示

0 件のコメント -2 件の古いコメントを表示-2 件の古いコメントを非表示

参考

カテゴリ

タグ

製品

リリース

Community Treasure Hunt

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

1 件のコメント
-1 件の古いコメントを表示-1 件の古いコメントを非表示

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示