Pre-determining the number of lines in a text file

Question

Matt J 2013 年 7 月 4 日

0
リンク

この質問への直接リンク

https://jp.mathworks.com/matlabcentral/answers/81137-pre-determining-the-number-of-lines-in-a-text-file

コメント済み: Richard Crozier 2019 年 8 月 14 日

Is there any programmatic way of determining in advance the number of lines in a text file, for use with dlmread, textscan, etc...? I mean other than some brute force way like reading line by line in a while loop until EOF is hit.

6 件のコメント
4 件の古いコメントを表示4 件の古いコメントを非表示

Matt J 2013 年 7 月 5 日

Well, okay, maybe that was a bad example. But surely, in general, it helps to know in advance how much data there is to read so you can plan, pre-allocate, etc...

Chris Volpe 2019 年 4 月 26 日

I realize this has been dormant for 5 years, and the API/behavior may have changed since then, but dlmread does the trick for me. I have a .csv (comma separated value) ASCII text file with 320 lines, and 240 comma-separated ASCII floating point numbers (including 'nan') on each line. I just do a plain vanilla "M = dlmread(filename);" and I get a 320x240 matrix in M.

サインインしてコメントする。

サインインしてこの質問に回答する。

Answer 1

Walter Roberson 2013 年 7 月 4 日

1
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/81137-pre-determining-the-number-of-lines-in-a-text-file#answer_90853

The only operating system that MATLAB has ever run on that supported that ability was DEC's VMS, and for technical reasons VMS's facility for that could not be used with MATLAB.

The modern treatment of "lines" as being delimited by a particular character or character pair (e.g., LF or CR+LF) does not offer any way to count the lines short of reading through the file and counting the delimiters.

3 件のコメント
1 件の古いコメントを表示1 件の古いコメントを非表示

Guru 2013 年 7 月 4 日

Well on that note, it isn't hard for you to write a simple function that can do that...

Matt J 2013 年 7 月 4 日

Or, I think it should be possible to allow dlmread to specify Infs in its range argument. That could trigger the file reading to stop when the limits of the file were reached.

サインインしてコメントする。

Answer 2

Guru 2013 年 7 月 4 日

9
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/81137-pre-determining-the-number-of-lines-in-a-text-file#answer_90858

編集済み: Guru 2013 年 7 月 4 日

MATLAB Online で開く

Just out of boredom, here's a function:

function n = linecount(fid)
n = 0;
tline = fgetl(fid);
while ischar(tline)
  tline = fgetl(fid);
  n = n+1;
end

Edited: Thanks for comment Walter

8 件のコメント
6 件の古いコメントを表示6 件の古いコメントを非表示

Matt J 2013 年 7 月 5 日

I cannot scroll any further than the final line of text. I guess that means it ends with no terminator? Both my version and Guru's correctly count the number of lines of actual text, though.

Walter Roberson 2017 年 1 月 10 日

Earlier I wrote that feof() never predicts end-of-file. That is true, but I was missing some information about the operation of fgetl and fgets that I just noticed today:

https://www.mathworks.com/help/matlab/import_export/import-text-data-files-with-low-level-io.html#br4ssin

"After each read operation, fgetl and fgets check the next character in the file for the end-of-file marker. Therefore, these functions sometimes set the end-of-file indicator before they return a value of -1.

[...]

This behavior does not conform to the ANSI specifications for the related C language functions." (emphasis added)

Sigh.

サインインしてコメントする。

Answer 3

Informaton 2014 年 10 月 29 日

5
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/81137-pre-determining-the-number-of-lines-in-a-text-file#answer_157042

MATLAB Online で開く

Another approach is to use the underlying operating system's functionality. Specifically, UNIX/Linux (i.e. also Mac) include a command line method 'wc -l [filename]' to get the line count of [filename].

To implement in MATLAB you could do something like this

if (~ispc) 
  [status, cmdout]= system('wc -l filenameOfInterest.txt');
  if(status~=1)
      scanCell = textscan(cmdout,'%u %s');
      lineCount = scanCell{1}; 
  else
      fprintf(1,'Failed to find line count of %s\n',filenameOfInterest.txt);
      lineCount = -1;
  end
else
  fprintf(1,'Sorry, I don''t know what the equivalent is for a windows system\n');
  lineCount = -1;
end

1 件のコメント
-1 件の古いコメントを表示-1 件の古いコメントを非表示

Ian McInerney 2017 年 4 月 30 日

編集済み: Ian McInerney 2017 年 4 月 30 日

MATLAB Online で開く

There is actually an equivalent command for Windows-based systems using the command line. It is discussed in some length here: https://blogs.msdn.microsoft.com/oldnewthing/20110825-00/?p=9803/

The command to run in the command prompt is:

find /c /v "" filename.txt

Which can then be used in the else condition in your if-check.

 else
    % For Windows-based systems
    [status, cmdout] = system(['find /c /v "" ', filename]);
    if(status~=1)
        scanCell = textscan(cmdout,'%s %s %u');
        lineCount = scanCell{3};
        disp(['Found ', num2str(lineCount), ' lines in the file']);
    else
        disp('Unable to determine number of lines in the file');
    end
end

サインインしてコメントする。

Answer 4

Walter Roberson 2017 年 1 月 10 日

3
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/81137-pre-determining-the-number-of-lines-in-a-text-file#answer_250008

MATLAB Online で開く

function n = linecount(filename)
  [fid, msg] = fopen(filename);
  if fid < 0
    error('Failed to open file "%s" because "%s"', filename, msg);
  end
    n = 0;
    while true
        t = fgetl(fid);
        if ~ischar(t)
            break;
        else
            n = n + 1;
        end
    end
    fclose(fid);

I have tested this with files that end with newline and with files that do not end with newline.

6 件のコメント
4 件の古いコメントを表示4 件の古いコメントを非表示

Jan 2017 年 10 月 9 日

@Peter: fread(fptr) does read the complete file and stores each byte in a double. Prefer: fread(fptr, Inf, '*uint8'), which uses less memory.

Walter Roberson 2017 年 10 月 9 日

MATLAB Online で開く

Just counting the \n can give an off-by-one error. You need to know if the final \n has any characters following it or not.

123\n456\n

has two lines.

123\n456\n7

has three lines

123\n456\n7\n

has three lines.

サインインしてコメントする。

Answer 5

Boris 2017 年 1 月 10 日

2
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/81137-pre-determining-the-number-of-lines-in-a-text-file#answer_250004

MATLAB Online で開く

I came across this code a while ago which is reasonably fast and works well on large files:

   fid = fopen(strFileName, 'rt');
   chunksize = 1e6; % read chuncks of 1MB at a time
   numRows = 1;
   while ~feof(fid)
       ch = fread(fid, chunksize, '*uchar');
       if isempty(ch)
           break
       end
       numRows = numRows + sum(ch == sprintf('\n'));
   end
  fclose(fid);

strFileName is the file name for the ascii file

numRows has the total number of lines

Now, the only problem remains efficiently testing for blank lines before using csvread \ dlmread to read (sizeable) chunks of the file (ie my code is thrown if the csv file ends in a blank line so it would be nice if I could test and count the number of blank lines at the end of my files...

3 件のコメント
1 件の古いコメントを表示1 件の古いコメントを非表示

Boris 2017 年 7 月 17 日

MATLAB Online で開く

Or used the code above and check if the file ends in 0A:

    if ch(end)==10
        numRows=numRows-1;
    end

Richard Crozier 2019 年 8 月 14 日

This is a great answer, worked great for me on a 5GB text file of point cloud data.

サインインしてコメントする。

Answer 6

Ken Atwell 2014 年 10 月 30 日

1
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/81137-pre-determining-the-number-of-lines-in-a-text-file#answer_157092

MATLAB Online で開く

If we can make two assumptions:

ASCII #10 is a reliable end-of-line marker
The entire file will fit into memory (that is, we're not talking about Big Data)

I would do the following (using the help for the plot command in this example):

 txt=fileread(fullfile(matlabroot, 'toolbox', 'matlab', 'graph2d', 'plot.m'));
 sum(txt==10)+1

This will be fast... certainly faster than "fgetl" approach, but maybe not as fast as the "wc" approach Hyatt put forth above (assuming you can live without Windows platform support).

1 件のコメント
-1 件の古いコメントを表示-1 件の古いコメントを非表示

Walter Roberson 2017 年 1 月 10 日

Files are not required to end with a line terminator, but they might. So a file with 3 lines might have either 2 linefeeds (separating line 1 from line 2, separating line 2 from line 3, nothing at end of file), or 3 linefeeds (one at the end of each line.) The above code would count 4 if this hypothetical file ended with linefeed (as is more common than not.)

サインインしてコメントする。

Answer 7

Dr. Erol Kalkan, P.E. 2016 年 5 月 19 日

1
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/81137-pre-determining-the-number-of-lines-in-a-text-file#answer_222784

編集済み: Matt J 2016 年 5 月 19 日

MATLAB Online で開く

Here is a short and fast way: Say file name to be read is apk.txt

fid = fopen('apk.txt','r');
Nrows = numel(textread('apk.txt','%1c%*[^\n]'));

1 件のコメント
-1 件の古いコメントを表示-1 件の古いコメントを非表示

Walter Roberson 2016 年 5 月 19 日

textread is deprecated.

How is your routine going to treat empty lines? I think the result is going to depend upon whether the file is CR/LF or LF delimited: in the CR/LF case the %1c is going to read the CR, leaving the LF to be matched by the %*[^\n], but in the LF case, the %1c is going to read the LF, moving the next line into position to be matched by the %*[^\n]

サインインしてコメントする。

Answer 8

Jan 2017 年 1 月 10 日

1
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/81137-pre-determining-the-number-of-lines-in-a-text-file#answer_250088

The determination of the number of lines require to read the file and interprete the line breaks. This means some real work and the disk access is the bottleneck in this case. Therefore e.g. importing the file to a cell string is not remarkably faster if you determine the number of lines at first. If the number of lines is determined initially, the main work would still be to "guess" a suiting buffer size for importing the lines. This requires either a copy of each line from the buffer to the Matlab string, or to realloc the imported string and allocate a new input buffer for each line.

I find it disappointing, that Matlab does not have a simple tool to import a text file to a cell string. Even the way to split a string (e.g. imported by fileread) to a cell considering the DOS/Linux/Mac linebreaks needed tools like strread, dataread, textread, textscan and regexp('split') which are not available in all Matlab versions and frequently outdated. Therefore I tried to write an efficient C-Mex again for the FileExchange. But the results have been grim: The best approaches have been only some percent faster than fread, replacing the different linebreaks by char(10) and calling a "Str2Cell" C-Mex. Neither counting the lines nor smart prediction techniques for a dynamic buffer allocation for the single lines accelerated the code sufficiently. The bottleneck of the disk access rules everything, even if the data are available in the cache already. For real file access, when the data are not read seconds before already and cached, all smart tricks are useless.

I think this is the reason why Matlab and many other tools do not contain a function for determine the number of lines in a text file.

If I find the time, I will try to write a LineCount.mex function, but I do not expect it to be much faster than Walter's Matlab approach.

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

サインインしてコメントする。

Answer 9

John BG 2017 年 1 月 10 日

0
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/81137-pre-determining-the-number-of-lines-in-a-text-file#answer_250009

MATLAB Online で開く

hi Matt

the command importdata returns a cell with all lines that are not empty of a text file.

The amount of elements of this cell is equal to the amount of lines of the text file.

file_name='filename.txt'
numel(importdata(fname))

if you find these lines useful would you please mark my answer as Accepted Answer?

To any other reader, if you find this answer of any help please click on the thumbs-up vote link,

thanks in advance for time and attention

John BG

<mailto:jgb2012@sky.com jgb2012@sky.com>

6 件のコメント
4 件の古いコメントを表示4 件の古いコメントを非表示

Walter Roberson 2017 年 1 月 10 日

編集済み: Walter Roberson 2017 年 1 月 10 日

Interesting approach, but has challenges.

It does not count empty lines (including in the cases described below)

It works for pure numeric files which have a single column.

If there are multiple numeric columns, separated either by commas or whitespace, then you need to take the number of rows rather than the number of elements.

If there is text (other than commas separating the columns) then importdata returns a struct. You have to add the number of rows of the 'data' field and (if present) the 'colheaders' field will have one row. If the number of rows of the 'textdata' field is the same as the number of rows of the 'data' field then it is probably a per-row text header and you should not count it in addition to the 'data' field rows, but if the number of rows of the 'textdata' field is anything different then it represents text headers before the numeric data and you need to count it and do not count colheaders in that case. It could happen that the number of rows of text headers just happens to be the same as the number of rows of data: you can detect that case because colheaders will be present when it otherwise would not be (I think that's what I observed.)

Possibly I missed a few cases.

When it does work, it is 50 times slower than my recent code.

Walter Roberson 2017 年 1 月 10 日

People other than Matt J read this, so before they implement the importdata() approach they need to know about its limitations. It is a nice compact expression that works well (if perhaps less efficient than it could be) under the circumstance of a file containing a single column of (pure) numeric values; unfortunately it turns out to be fragile if that condition is not met.

Matt J 2017 年 1 月 10 日

編集済み: Matt J 2017 年 1 月 10 日

@John BG,

I'm afraid I see no special advantage to what you propose over other proposals. Walter's comments about speed and reliabilty aside, you will recall from my original post that I was asking if there was a way to determine the number of lines in the file without scanning through all the data in the file. Walter has already answered that this is impossible.

Worse, though, your solution not only reads through all the data in the file, but allocates storage for all the data simultaneously in MATLAB, something you will see I was also trying to avoid as discussed under Guru's answer.

サインインしてコメントする。

Pre-determining the number of lines in a text file

6 件のコメント
4 件の古いコメントを表示4 件の古いコメントを非表示

採用された回答

3 件のコメント
1 件の古いコメントを表示1 件の古いコメントを非表示

その他の回答 (8 件)

8 件のコメント
6 件の古いコメントを表示6 件の古いコメントを非表示

1 件のコメント
-1 件の古いコメントを表示-1 件の古いコメントを非表示

6 件のコメント
4 件の古いコメントを表示4 件の古いコメントを非表示

3 件のコメント
1 件の古いコメントを表示1 件の古いコメントを非表示

1 件のコメント
-1 件の古いコメントを表示-1 件の古いコメントを非表示

1 件のコメント
-1 件の古いコメントを表示-1 件の古いコメントを非表示

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

6 件のコメント
4 件の古いコメントを表示4 件の古いコメントを非表示

参考

カテゴリ

タグ

製品

Community Treasure Hunt

Pre-determining the number of lines in a text file

6 件のコメント 4 件の古いコメントを表示4 件の古いコメントを非表示

採用された回答

3 件のコメント 1 件の古いコメントを表示1 件の古いコメントを非表示

その他の回答 (8 件)

8 件のコメント 6 件の古いコメントを表示6 件の古いコメントを非表示

1 件のコメント -1 件の古いコメントを表示-1 件の古いコメントを非表示

6 件のコメント 4 件の古いコメントを表示4 件の古いコメントを非表示

3 件のコメント 1 件の古いコメントを表示1 件の古いコメントを非表示

1 件のコメント -1 件の古いコメントを表示-1 件の古いコメントを非表示

1 件のコメント -1 件の古いコメントを表示-1 件の古いコメントを非表示

0 件のコメント -2 件の古いコメントを表示-2 件の古いコメントを非表示

6 件のコメント 4 件の古いコメントを表示4 件の古いコメントを非表示

参考

カテゴリ

タグ

製品

Community Treasure Hunt

6 件のコメント
4 件の古いコメントを表示4 件の古いコメントを非表示

3 件のコメント
1 件の古いコメントを表示1 件の古いコメントを非表示

8 件のコメント
6 件の古いコメントを表示6 件の古いコメントを非表示

1 件のコメント
-1 件の古いコメントを表示-1 件の古いコメントを非表示

6 件のコメント
4 件の古いコメントを表示4 件の古いコメントを非表示

3 件のコメント
1 件の古いコメントを表示1 件の古いコメントを非表示

1 件のコメント
-1 件の古いコメントを表示-1 件の古いコメントを非表示

1 件のコメント
-1 件の古いコメントを表示-1 件の古いコメントを非表示

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

6 件のコメント
4 件の古いコメントを表示4 件の古いコメントを非表示