How to optimize imagedatastore speed?

Question

Andrew Jamieson 2018 年 2 月 14 日

0
リンク

この質問への直接リンク

https://jp.mathworks.com/matlabcentral/answers/382692-how-to-optimize-imagedatastore-speed

コメント済み: Andrew Jamieson 2018 年 2 月 19 日

I have close to 1 million images I'd like to prepare with imageDatastore (imds) for CNN deep learning training. I would like to create the imds via giving a cell array of the specific image paths.

These Paths are split among 1000s of directories or so. When creating a sub-set as a test, say around 30,000 images, it takes imageDataStore around 30sec or so to initialize. If I try all 1 million, it was taking longer than 2 hours! I did not wait for it to finish. I checked the code profiler to see what the issue was and it turned out to be that it was always checking the dir of each image. (see image)

My alternative was to simply not use imds all together and load 160GB would of images into memory!

Suggestions please. Is there a way to disable this directory checking mechanism?

Thank you.

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

サインインしてコメントする。

サインインしてこの質問に回答する。

Answer 1

Jiro Doke 2018 年 2 月 16 日

1
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/382692-how-to-optimize-imagedatastore-speed#answer_305426

MATLAB Online で開く

I'd like to know a bit more about the folder structure and how you are splitting/choosing your sub-set. The reason I'm asking is that instead of providing a list of image paths using cell arrays, it's much more efficient to provide a list of folders (even better, the top folder) and ask imageDatastore to search through subfolders. Then, in addition you can use splitEachLabel to divide up your image datastore into training and testing set.

For example, I created 1 million images split across 5000 directories (5000 directories with 200 images per directory). Here, each directory corresponds to a category.

To obtain an image datastore for all 1 million images,

tic
imd = imageDatastore('20180216T085544\',...
      'IncludeSubfolders',true,'LabelSource','foldernames');
toc
Elapsed time is 140.757894 seconds.

Certainly, much shorter than 2 hours. Then, to split the datastore into training (70%) and testing (30%),

tic
[imdTrain,imdTest] = splitEachLabel(imd,0.7);
toc
Elapsed time is 26.851960 seconds.

4 件のコメント
2 件の古いコメントを表示2 件の古いコメントを非表示

Jiro Doke 2018 年 2 月 17 日

Glad to hear that at least this technique is giving you more acceptable performance.

Regarding using folder names for labels, of course you're not required to do that. You can choose to specify the labels separately, once you define the datastore.

Does this technique meet your needs? I'm still not sure if I completely understand your requirement for using a cell array of image paths. Does using a folder and subfolders, with a custom list of labels, do what you hope to accomplish? If not, perhaps this is a use case that we may need to feed back to our development team.

Andrew Jamieson 2018 年 2 月 19 日

MATLAB Online で開く

Thanks. The feedback I would have is to perhaps offer a flag/parameter/option for imds to NOT search every directory when given an explicit list of files with the full path already established. In other words, if possible also optimize for a list of files, not just the directory approach.

Also, if we are talking feedback, the ability to define a label for the entire imds in one line would be nice (maybe this is possible and I am confused) but it is an extra step to create a cell array or categorical array matching the list of files, whereas, if I could just do the following:

imdsTrain.Labels = myLabel;

But this is a minor quibble. Thanks again!

サインインしてコメントする。

How to optimize imagedatastore speed?

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

採用された回答

4 件のコメント
2 件の古いコメントを表示2 件の古いコメントを非表示

その他の回答 (0 件)

参考

カテゴリ

タグ

製品

Community Treasure Hunt

How to optimize imagedatastore speed?

0 件のコメント -2 件の古いコメントを表示-2 件の古いコメントを非表示

採用された回答

4 件のコメント 2 件の古いコメントを表示2 件の古いコメントを非表示

その他の回答 (0 件)

参考

カテゴリ

タグ

製品

Community Treasure Hunt

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

4 件のコメント
2 件の古いコメントを表示2 件の古いコメントを非表示