Speedup processing of larger binary files
古いコメントを表示
Dear all,
I have to process thousands of binary files (each of 16MB) by reading pairs of them and creating a bit-level data structure (usually a 1x134217728 array) in order to process them on bit level.
Currently I am doing this the following way:
conv = @(c) uint8(bitget(c,1:32));
measurement = NaN(1,(sizeOfMeasurements*8)) %(1,134217728)
fid = fopen(fileName, 'rb');
byteContent = fread(fid,'uint32');
fclose(fid);
bitRepresentation1 = arrayfun(conv, byteContent, 'UniformOutput', false);
measurement=[bitRepresentation1{:}];
end
However, reading a single file takes minutes and makes evaluation of the entire data set a very time-consuming task.
UPDATE: I replaced fopen successfully by memmapfile using the code below:
m=memmapfile(fileName,'Format',{'uint32', [4194304 1], 'byteContent'});
byteContent=m.data.byteContent;
byteContent = double(byteContent);
I printed timing information (using tic/toc) for the individual instructions and it turns out that the bottleneck is:
bitRepresentation1 = arrayfun(conv, byteContent, 'UniformOutput', false); % see first line of code for conv
Are there more efficient was of transforming byteContent into an array that stores a bit per index?
UPDATE2: I received suggestion from another source, that there are superfluous loops introduced by the conv function. The new code looks like this:
fid = fopen(fileName, 'rb');
bitContent = fread(fid,'*ubit64');
fclose(fid);
conv = @(ii) uint8(bitget(bitContent, ii));
bitRepresentation = arrayfun(conv, 1:64, 'UniformOutput', false);
measurement = reshape(cat(2, bitRepresentation{:})', 1, []);
This brings execution time of code line bitRepresentation = arrayfun[...] down from 39s to 0.5s. However, now the bottleneck is the very last code line with 5s.
5 件のコメント
KSSV
2016 年 11 月 29 日
m = memmapfile(file,'Format','double') ;
Try this...any error?
What is the prupose of your line:
bitRepresentation1 = arrayfun(conv, byteContent, 'UniformOutput', false);
What I don't understand is why each single bit has to be stored as individual numbers, wasting memory and processing time.
Computers already have a very efficient way of storing and processing arrays of bits. It's called uint8, uint16, etc.
Here is a novel idea: use a bit to store a bit rather than a byte to store a bit. Leave your numbers as is. Use 8 times less memory.
Jan
2016 年 11 月 29 日
@Guillaume: Storing a bit in a bit is very efficient for the storing. But the processing is much harder, e.g. when for logical indexing. I'm using a C-mex script for logical indexing with bit fields, which is remarkably faster than indexing with LOGICAL vectors. But the main effect is not the compact storage of the bits, but I guess that Matlab does not pre-allocate efficiently. For an LOGICAL version see: FEX: CopyMask . I'm still astonished.
Walter Roberson
2016 年 11 月 30 日
Did you try timing dec2bin() or de2bi() compared to bitget() ?
回答 (1 件)
Omit this line:
measurement = NaN(1,(sizeOfMeasurements*8)) %(1,134217728)
A pre-allocation is a waste of time, if the result is overwritten later.
If you want to access the data bitwise, use an integer type:
byteContent = fread(fid, '*uint32'); % Instead of storing it in a DOUBLE
Creating a large cell is not efficient. I assume that these lines can be replaced:
bitRepresentation1 = arrayfun(conv, byteContent, 'UniformOutput', false);
measurement=[bitRepresentation1{:}];
If you explain the wanted result, a suggestion for a replacement is possible and I will expand my answer.
[EDITED]
fid = fopen(FileName, 'r');
if fid == -1
error('Cannot open file: %s', FileName);
end
Data = fread(fid, [8, inf], 'ubit1=>uint8');
fclose(fid);
Now each bit is stored as an UINT8 element of the value 1 or 0.
Perhaps this is faster (at least it is in R2009a: 0.25 sec on a virtual machine for a 16MB file):
Data = fread(fid, inf, '*uint8');
Result = [bitget(Data, 1), bitget(Data, 2), bitget(Data, 3), ...
bitget(Data, 4), bitget(Data, 5), bitget(Data, 6), ...
bitget(Data, 7), bitget(Data, 8)];
What a pitty that bitget(X, 1:8) is not valid in Matlab, when X is not a scalar.
カテゴリ
ヘルプ センター および File Exchange で Characters and Strings についてさらに検索
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!