Is it possible to split a large text file into half and subsequently use textscan for both parts?

1 回表示 (過去 30 日間)
Hi,
This is my first time in this forum.
I am working on a large text file containing a large number of data 10^5 * 600 of 16-digit elements. I use the textscan command to read a string data. I already known the number of columns, so I am able to generate a format spec beforehand. The main part of my code is shown below:
array=textscan(fileID,Spec,NumRow,'Delimiter',delim,'MultipleDelimsAsOne',true,'HeaderLines',1,'ReturnOnError',false);
When I specify the NumRow (number of rows) as 50000 or below, it works fine and only took about 1 minute to run. However, my system seems to crash when I increase the NumRow to 100,000. I suspect that my virtual memory has reached its limit.
Therefore, I wonder that is there a way I can split the data into two parts. Say, from the 1st -50,000th row and 50000th -100000th row
Thanks! Ati
  3 件のコメント
Atipong
Atipong 2013 年 5 月 14 日
Hi,
It's something like this, with 10^5 rows and 600 columns separated by space.
-4.7533250000e-05 -4.8990000000e-05 -3.5166750000e-01
1.5550000000e-02 -1.5832100000e-09 -4.3949250000e-01
-1.9371000000e-04 -1.1074875000e-01 -6.1198500000e-01
Cedric
Cedric 2013 年 5 月 14 日
So when there is no minus sign, there are two spaces?

サインインしてコメントする。

回答 (2 件)

per isakson
per isakson 2013 年 5 月 13 日
編集済み: per isakson 2013 年 5 月 13 日
Something like this
nRow = 50000;
fid = fopen( ... )
buf1 = textscan( fid, ..., nRow, .... );
....
buf2 = textscan( fid, ..., nRow, .... );
fclose( fid );
  3 件のコメント
per isakson
per isakson 2013 年 5 月 14 日
編集済み: per isakson 2013 年 5 月 14 日
You have to process the data in buf1 and
clear buf1
before reading the rest of the file. Or
buf = textscan( fid, ..., nRow, .... );
....
buf = textscan( fid, ..., nRow, .... );
I guess, I would have written the data to one or more binary files and used memmapfile to work with the data.
Walter Roberson
Walter Roberson 2013 年 5 月 14 日
per is correct.
To be explicit, textscan() does not read in the entire file when you specify the repeat count.

サインインしてコメントする。


Yao Li
Yao Li 2013 年 5 月 14 日
You can use for loops to auto-generate the formatSpec for textscan(). For example, you can read two column at a time by defining formatSpec as:
for j=1:300
for k=1:600
temp{k}='%*f';
end
temp{2*j}='%f';
temp{2*j-1}='%f';
formatSpec_array{j}=strcat(temp{1},temp{2});
for i=3:600
formatSpec_array{j}=strcat(formatSpec_array{j},temp{i});
end
end

カテゴリ

Help Center および File ExchangeText Files についてさらに検索

タグ

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by