Reading all data of streams from a adft data file with large ItemCount is very slow.

Question

Shikha 2024 年 2 月 14 日

0
リンク

この質問への直接リンク

https://jp.mathworks.com/matlabcentral/answers/2081913-reading-all-data-of-streams-from-a-adft-data-file-with-large-itemcount-is-very-slow

コメント済み: Shikha 2024 年 2 月 22 日

Hello,

Basically I need to read the streams data from ADFT DAT file and perform some preprocessing related to coordinates frames transformations. For this I am trying to read all data from the paricular selected stream which has an ItemCount of 18000 and trying to store in a csv file. The read(streamData) itself takes around 10-12 mins and even more if the stream has more structered data.

Can someone suggest a way which can allow me to make this reading process faster.

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

サインインしてコメントする。

サインインしてこの質問に回答する。

Answer 1

Shubham 2024 年 2 月 22 日

2
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/2081913-reading-all-data-of-streams-from-a-adft-data-file-with-large-itemcount-is-very-slow#answer_1414173

MATLAB Online で開く

Hi Shikha,

It seems that you were trying to read stream data from ADFT DAT file. You can try speedup the reading process by chunking the input data and leveraging the parallel computing toolbox for reducing the time taken to read the data.

You can read data using “adftFileReader” in chunks using the “select” function while providing a time range or index range as arguments. For more information, please refer to the following documentation: https://www.mathworks.com/help/driving/ug/read-data-from-adtf-dat-files.html#ReadDataFromADTFDATFilesExample-7

You can try testing it out using the following example as well:

openExample('driving/ExtractVideoStreamFromADTFDATFileExample');

Once you create chunks of the file, you can read it parallelly. Here is a simple example for reading a file using parfor:

% Create dummy data and write to a file
data = (1:1e8)';
lines = length(data);
% Uncomment the following lines when creating the dummy data for the first time
% fileID = fopen('dummy_data.txt', 'w');
% fprintf(fileID, '%d\n', data);
% fclose(fileID);
% MATLAB code to read data from a file in parallel and store in an ordered array
if isempty(gcp('nocreate'))
    parpool; 
end
% Define the number of workers
numWorkers = 6; 
chunks = 10;
chunkSize = ceil(lines / chunks);
% Preallocate a cell array to hold the data for each chunk
dataCellArray = cell(chunks, 1);
% Read the file in parallel using parfor
parfor (curChunk = 1:chunks, numWorkers)
    startLine = (curChunk - 1) * chunkSize + 1;
    endLine = min(curChunk * chunkSize, lines);
    dataCellArray{curChunk} = readChunk(startLine, endLine, 'dummy_data.txt');
    disp("done for ");
    disp(curChunk);
end
% Concatenate the data from each worker to form the complete array
dataArray = vertcat(dataCellArray{:});
da = vertcat(dataArray{:})
function dataChunk = readChunk(startLine, endLine, filename)
    fileID = fopen(filename, 'r');
    dataChunk = textscan(fileID, '%d', endLine-startLine+1, 'HeaderLines', startLine-1);
    fclose(fileID);
end

You can modify the above code snippet to work for “adftFileReader” as well. Please refer to the following code snippet:

numWorkers = 6; 
chunkSize = 10;
numChunks = itemcount/chunkSize;
dataCellArray = cell(numChunks, 1);
parfor (curChunk = 1:numChunks, numWorkers)
    startIndex   = (curChunk-1)*chunkSize+1;
    endIndex     = min(startIndex+chunkSize-1,itemcount);
    dataCellArray{curChunk} = readChunk(startIndex, endIndex);
    disp("done for ");
    disp(curChunk);
end
dataArray = vertcat(dataCellArray{:})
function dataChunk = readChunk(startIndex, endIndex)
    dataFolder   = fullfile(tempdir, 'adtf-video', filesep); 
    datFileName = fullfile(dataFolder,"sample_can_video.dat");
    file_reader  = adtfFileReader(datFileName);
    stream_index  = 2;
    stream_reader = select(file_reader, stream_index, IndexRange=[startIndex endIndex]);
    dataChunk = read(stream_reader);
end

I have tested the code snippet on the example mentioned above. I have created chunks containing 10 frames (total 149 frames are present) and here is a glimpse of result stored in “dataArray”:

The first 10 frames of the video are stored as:

I would suggest to profile your code and perform the tasks asynchronously.

I hope this helps!

3 件のコメント
1 件の古いコメントを表示1 件の古いコメントを非表示

Shubham 2024 年 2 月 22 日

Hi Shikha,

The runtime should still be reduced upon reading a stream of structure of structures when using multiple workers. However if you think you still require additional help, you can start a new thread along with your data files and code snippets.

Thanks

Shikha 2024 年 2 月 22 日

Hi Shubham,

You are right about the relative increase in runtime using multiple workers. Also, I would surely open a new thread if I require more help.

Thanks a lot for your help!!

サインインしてコメントする。

Reading all data of streams from a adft data file with large ItemCount is very slow.

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

採用された回答

3 件のコメント
1 件の古いコメントを表示1 件の古いコメントを非表示

その他の回答 (0 件)

参考

カテゴリ

タグ

製品

リリース

Community Treasure Hunt

Reading all data of streams from a adft data file with large ItemCount is very slow.

0 件のコメント -2 件の古いコメントを表示-2 件の古いコメントを非表示

採用された回答

3 件のコメント 1 件の古いコメントを表示1 件の古いコメントを非表示

その他の回答 (0 件)

参考

カテゴリ

タグ

製品

リリース

Community Treasure Hunt

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

3 件のコメント
1 件の古いコメントを表示1 件の古いコメントを非表示