Is it possible to split a large text file into half and subsequently use textscan for both parts?

Question

0 投票

Hi,

This is my first time in this forum.

I am working on a large text file containing a large number of data 10^5 * 600 of 16-digit elements. I use the textscan command to read a string data. I already known the number of columns, so I am able to generate a format spec beforehand. The main part of my code is shown below:

array=textscan(fileID,Spec,NumRow,'Delimiter',delim,'MultipleDelimsAsOne',true,'HeaderLines',1,'ReturnOnError',false);

When I specify the NumRow (number of rows) as 50000 or below, it works fine and only took about 1 minute to run. However, my system seems to crash when I increase the NumRow to 100,000. I suspect that my virtual memory has reached its limit.

Therefore, I wonder that is there a way I can split the data into two parts. Say, from the 1st -50,000th row and 50000th -100000th row

Thanks! Ati

3 件のコメント
1 件の古いコメントを表示 1 件の古いコメントを非表示

Atipong 2013 年 5 月 14 日

Hi,

It's something like this, with 10^5 rows and 600 columns separated by space.

-4.7533250000e-05 -4.8990000000e-05 -3.5166750000e-01

1.5550000000e-02 -1.5832100000e-09 -4.3949250000e-01

-1.9371000000e-04 -1.1074875000e-01 -6.1198500000e-01

Cedric 2013 年 5 月 14 日

So when there is no minus sign, there are two spaces?

サインインしてコメントする。

サインインしてこの質問に回答する。

Follow Question

Answer 1

per isakson 2013 年 5 月 13 日

編集済み: per isakson 2013 年 5 月 13 日

MATLAB Online で開く

0 投票

Something like this

    nRow = 50000;
    fid  = fopen( ... )
    buf1 = textscan( fid, ..., nRow, .... );
    ....
    buf2 = textscan( fid, ..., nRow, .... );
    fclose( fid );

3 件のコメント
1 件の古いコメントを表示 1 件の古いコメントを非表示

per isakson 2013 年 5 月 14 日

編集済み: per isakson 2013 年 5 月 14 日

MATLAB Online で開く

You have to process the data in buf1 and

clear buf1

before reading the rest of the file. Or

    buf = textscan( fid, ..., nRow, .... );
    ....
    buf = textscan( fid, ..., nRow, .... );

I guess, I would have written the data to one or more binary files and used memmapfile to work with the data.

Walter Roberson 2013 年 5 月 14 日

per is correct.

To be explicit, textscan() does not read in the entire file when you specify the repeat count.

サインインしてコメントする。

Answer 2

Yao Li 2013 年 5 月 14 日

MATLAB Online で開く

0 投票

You can use for loops to auto-generate the formatSpec for textscan(). For example, you can read two column at a time by defining formatSpec as:

for j=1:300
    for k=1:600
        temp{k}='%*f';
    end
    temp{2*j}='%f';
    temp{2*j-1}='%f';
    formatSpec_array{j}=strcat(temp{1},temp{2});
    for i=3:600
    formatSpec_array{j}=strcat(formatSpec_array{j},temp{i});
    end
end

0 件のコメント
-2 件の古いコメントを表示 -2 件の古いコメントを非表示

サインインしてコメントする。

Is it possible to split a large text file into half and subsequently use textscan for both parts?

3 件のコメント
1 件の古いコメントを表示 1 件の古いコメントを非表示

回答 (2 件)

3 件のコメント
1 件の古いコメントを表示 1 件の古いコメントを非表示

0 件のコメント
-2 件の古いコメントを表示 -2 件の古いコメントを非表示

カテゴリ

タグ

Community Treasure Hunt

Is it possible to split a large text file into half and subsequently use textscan for both parts?

3 件のコメント 1 件の古いコメントを表示 1 件の古いコメントを非表示

回答 (2 件)

3 件のコメント 1 件の古いコメントを表示 1 件の古いコメントを非表示

0 件のコメント -2 件の古いコメントを表示 -2 件の古いコメントを非表示

カテゴリ

タグ

参考

Community Treasure Hunt

3 件のコメント
1 件の古いコメントを表示 1 件の古いコメントを非表示

3 件のコメント
1 件の古いコメントを表示 1 件の古いコメントを非表示

0 件のコメント
-2 件の古いコメントを表示 -2 件の古いコメントを非表示