- "I am using xlsread"   However, the code doesn't show that you use xlsread. Why is that?
- How should I interpret   "raw 528439x20 1268253582 cell" ?
- What does   whos raw   show?
Insufficient storage (activex) - unable to complete operation - pulling in excel data more quickly.
1 回表示 (過去 30 日間)
古いコメントを表示
Hi All,
I have some reasonably large excel files to process (200-800mb). After a bit of dabbling and on review of the numerous threads here on the forum I have elected to use activex controls etc.
For speed of execution I am using xlsread (as opposed to csvread / importdata). Textscan is less than ideal due the variation between sheets (i.e. requires a format specifier).
To make things easier I have been reading from the excel sheets in bite sized chunks to 1) help speed up the read and 2) to avoid (other) memory errors.
Presently I am facing a challenges with the read call, that is; eRange = get(e.Activesheet,'Range',sheetR).
Currently the script loops through a 'block' of 20 columns or so reading in all rows. On the first call
that is:
eRange = get(e.Activesheet,'Range',sheetR).
raw = eRange.Value;
the program executes as expected. However looping through on the succeeding call and loading the next block I receive the error
"Error: Not enough storage is available to complete this operation."
Below is a brief outline of the script and corresponding output.
-------------- Script ------------------
% Uses activeX controls.
% pathAndFilename = 'C:\Users\owner\Desktop\Convert GS logs\temp.xlsx'
%%Open excel application
disp('Opening Excel application...')
e=actxserver('excel.application');
wb=e.Workbooks;
w=Open(wb, pathAndFilename');
%%Find range / break range of sheet into managable chunks
% Define excel column headers (NOT r1c1 reference style..)
Alphabet=('A':'Z').';
[I,J] = meshgrid(1:26,1:26);
XX=[Alphabet(I(:)), Alphabet(J(:))];
XX=cellstr(strvcat(Alphabet,XX));
clear I J Alphabet
% Range
disp('Extracting range from excel sheet...')
RangeInString=e.ActiveSheet.UsedRange.Address;
range = strsplit(RangeInString,'$');
nrow = str2double(range(end));
ncol = find(strcmp(range(end-1),XX));
% Readuce the scale of the succeeding reads (Time / memory overheads).
if ncol > 20
splt = ceil(ncol/lump);
bite = floor(ncol / splt);
columns = [1:bite:ncol];
surp_col = ncol - columns(end);
columns(end) = columns(end) + surp_col;
end
clear splt surp_col bite range
%%Do some work
% loop through the sheet and extract the respective chunks
for i=1:length(columns)-1
disp(['processing columns ',num2str(columns(i)), ' through ' num2str(columns(i+1)),' of ',num2str(ncol), ' columns...'])
tic
% set the range for the current read / chunk
sheetR = [XX{columns(i),:},'1:',XX{columns(i+1),:},num2str(nrow)];
%
% eRange = get(e.Activesheet,'Range','A1:B2');
eRange = get(e.Activesheet,'Range',sheetR);
raw = eRange.Value;
clear raw sheetR
end
What I wish to achieve is to read the full xlsx file and place the data in a .mat. To reduce the overhead, on each loop the data 'raw' is written to a .mat using the -7.3 format. All non essential variable are removed using a 'clearvars -except' call.
The scrip runs fine of the first 'chunk', however I receive an error on the following iteration. What is odd to me is that each 'chunk' pulled in on the read call is precisely the same size (number of rows / columns).
The output of the script is as follows:
Loading log file: 20160719_0945.xlsx ...
Opening Excel application...
Extracting range from excel sheet...
processing columns 1 through 20 of 152 columns...
processing columns 20 through 39 of 152 columns...
Error: Not enough storage is available to complete this operation.
The script fails on the 'raw = eRange.Value' call.
A 'whos' on the preceding line (prior to the 'raw = eRange.Value') suggests little memory overhead.
Name Size Bytes Class Attributes
MatObj 1x1 112 matlab.io.MatFile
MetaData 1x1 3069 struct
RangeInString 1x15 30 char
UAVnav_Log_ID 1x1 138 cell
XX 702x1 81380 cell
columns 1x8 64 double
e 1x1 COM.excel_application
eRange 1x1 Interface.00020846_0000_0000_C000_000000000046
filename 1x18 36 char
i 1x1 8 double
lump 1x1 8 double
matf 1x13 26 char
ncol 1x1 8 double
nrow 1x1 8 double
pathAndFilename 1x71 142 char
w 1x1 Interface.000208DA_0000_0000_C000_000000000046
wb 1x1 Interface.000208DB_0000_0000_C000_000000000046
I am presently running the script on a 64bit machine with 8Gb of RAM. However, I would very much like to deploy this / similar scripts in the future on a smaller machine for those times when we are 'working in the field'. Any pointers would be greatly appreciated. A simple solution is to significantly reduce the number of columns read during any one call - however, surely there is a better way?
The example file I am running has 530,000 rows (530,000 rows x 158 columns) and generates the cell array;
raw 528439x20 1268253582 cell
2 件のコメント
per isakson
2016 年 8 月 6 日
編集済み: per isakson
2016 年 8 月 10 日
回答 (0 件)
参考
カテゴリ
Help Center および File Exchange で ActiveX についてさらに検索
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!