Fastest way to add string
3 ビュー (過去 30 日間)
古いコメントを表示
I'm dealing with very large csv files. I'm having little to no problem with speed in reading from them with readtable. However, I have found (and reported) a bug in readtable where a blank value in the first column (the line starts with the delimiter, e.g. ',') throws off all the data. A lot of my files have blank values in the first column (due to the way the equipment I'm using records the data)
So, I have to "preprocess" the files and look for these blank columns in the csv file. The most efficient method I've found is the following:
fprintf('Reading File...');
ch = fread(YGID, [1,chunksize], 'int8=>char');
%cch = char(ch');
fprintf('Getting Number Of Lines...');
nol = sum(ch == sprintf('\n')); % number of lines
fprintf('%i\n',nol);
fprintf('Replacing final commas...\n');
cch = regexprep(ch,',(\r|\n)+','$1');
clear ch;
fprintf('Getting line locations...\n');
hlocs = regexp(cch,'\n');
fprintf('Writing Header File...\n');
fwrite(HDID,cch(hlocs(2)+1:hlocs(10)));
fprintf('Replacing Initial Commas\n');
ccch = regexprep(cch,'(\r|\n)+,','$1 ,');
YGID is the file pointer from an fopen. Note that I'm purposely making new variables (not memory efficient) as I have 16 GB of RAM available on my machine and I find making a completely new variable is faster. However, once the file is of a sufficient size (>20 MB, I have some over 200MB), even this becomes very slow. The line it is getting stuck on is "ccch = regexprep(cch,'(\r|\n)+,','$1 ,');" I suspect it's because with each additional space being added (there are hundreds of thousands) it's reallocating memory for the variable. I've tried to "preallocate" the new variable with "ccch = blanks(chunksize + nol);" before it and it didn't seem to make a difference.
Is there any more efficient way to do this task?
0 件のコメント
採用された回答
その他の回答 (0 件)
参考
製品
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!