textscan formatting to import a large text file
2 ビュー (過去 30 日間)
古いコメントを表示
fid = fopen(FileToLoad,'rt');
data = textscan(fid, colFormats,'HeaderLines',1,'Delimiter','\t');
fclose(fid)
I have a problem with colFormats input. I have 2900 columns in the text file and I know specifically the columns that I want to import. I am opening the files in a loop .so in one file the number of columns is 2900 in another 2880 etc.... but for each file I know the number of the columns that I want to import. for example , for the above mentioned codes the columns are :162,166 ,209,240,249,258,265,269,2280,2281,2285,2297,2813,2860.
0 件のコメント
回答 (1 件)
dpb
2016 年 7 月 9 日
編集済み: dpb
2016 年 7 月 9 日
Presuming you have a way to generate the column-wanted vector, build the format string dynamically
>> c=[1,162,166 ,209,240,249,258,265,269,2280,2281,2285,2297,2813,2860];
>> fmt=arrayfun(@(d) [repmat('%*f',1,d) '%f'],diff(c),'uniformoutput',0);
>> fmt=strcat(fmt{:});
>> whos fmt
Name Size Bytes Class Attributes
fmt 1x8605 17210 char
>>
The "trick" is to augment the columns by prepending a 1, then diff gives the number of columns to skip before reading a column. arrayfun builds a cell array of those substrings of the overall format string, strcat runs 'em all together in one long character string.
It might still be faster to read the whole file and then just keep the wanted columns it it's not too big for memory.
ADDENDUM/ERRATUM:
Per comment below, if there are more columns than the last that is wanted, then the scanning will get messed up when next record doesn't match...add the following before trying the read...
if maxCol>c(end) % more columns in the file than last one read
fmt=[fmt '%*[^\n]']; % skip to end of record added
end
You'll need to know the number of columns in each file as well as which are to be read...this could theoretically be determined empirically by reading the first record as character, searching for and counting the number of delimiters.
2 件のコメント
dpb
2016 年 7 月 9 日
編集済み: dpb
2016 年 7 月 9 日
Without any data file or specifications, no, not really...while I've never tried such length on format spec, try the logic on a shorter line first where you can see what's actually going on.
ADDENDUM Oh, brain cramp...if the last read column isn't the last column in the record, you need to append a "skip rest of line" string...if it is, then not.
参考
カテゴリ
Help Center および File Exchange で Large Files and Big Data についてさらに検索
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!