Reading data from ASCII file

Question

Pratyush Manocha 2020 年 5 月 17 日

0
リンク

この質問への直接リンク

https://jp.mathworks.com/matlabcentral/answers/526044-reading-data-from-ascii-file

コメント済み: Pratyush Manocha 2020 年 5 月 19 日

採用された回答: Stephen23

data.txt

MATLAB Online で開く

Hi,

I have an ASCII file with data corresponding to several runs of a simulation. I want to plot all the runs in the same plot for comparison purposes and know that I have to import the data as a matrix to begin with any other processing. However, I have been facing problems with importing the data. I have attached the file here for reference purposes.

This was my take on solving this problem:

1. Using textscan

fidi = fopen('data.txt');
D=textscan(fidi, '%u %u');
E = cell2mat(D);

However, this returned empty cells as is shown by the following command:

whos E
  Name      Size            Bytes  Class     Attributes
  E         0x2                 0  uint32              

2. Using textread

fid = 'data.txt';
B = textread(fid, '%f %f');

This returned the following errors:

Error using dataread
Number of outputs must match the number of unskipped input fields.
Error in textread (line 171)
[varargout{1:nlhs}]=dataread('file',varargin{:}); %#ok<REMFF1>

Then I changed the code to this:

[B,C]=textread(fid, '%f %f');

Which in turn returned the following errors:

Error using dataread
Trouble reading floating point number from file (row 1, field 1) ==> vds	Id(M1)\n
Error in textread (line 171)
[varargout{1:nlhs}]=dataread('file',varargin{:}); %#ok<REMFF1>

3. Using spcread

B=spcread(fid);

This gave the following error:

Undefined function or variable 'spcread'.

4. Using importdata

I had limited success with this, but this was as far as I could go...

A=importdata(fid);

This gave me a 1x1 struct file with a 101x2 double comprising of the first 101 lines of the text file and a 2x1 cell with the first two header lines.

I then removed all the header files which did import the entire data, albeit without all headers and would require splitting into multiple matrices to be able to plot all of the runs in one graph (because if I recall correctly, plot function doesn't support dot indexing for variables of the form A.data) like so (output taken from a commercial spice simulator):

Could someone help me import the data properly so that I can move to plotting the curves?

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

サインインしてコメントする。

サインインしてこの質問に回答する。

Answer 1

Stephen23 2020 年 5 月 18 日

0
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/526044-reading-data-from-ascii-file#answer_433039

編集済み: Stephen23 2020 年 5 月 18 日

MATLAB Online で開く

data.txt

Importing the entire file data as character, converting to string, spltting that string into multiple little strings, and then finally converting those many little strings to numeric is convoluted, an inefficient use of memory (because on each of those conversions you duplicate the data in MATLAB memory), and involves multiple data type conversions (also inefficient).

The most efficient way to import the data is to get the importing routine to directly convert the data to numeric, for example very simply using textscan (no data duplication required):

opt = {'CollectOutput',true};
hdr = {};
out = {};
[fid,msg] = fopen('data.txt','rt');
assert(fid>=3,msg) % ensure the file opened correctly.
fgetl(fid); % read and ignore the very first line.
while ~feof(fid)
    hdr{end+1} = fgetl(fid);
    out(end+1) = textscan(fid,'%f%f',opt{:});
end
fclose(fid);

Giving all eleven groups of data:

>> size(out)
ans =
     1    11

Take a quick look at the imported data:

>> out{1}
ans =
            0            0
          0.1   1.3168e-10
          0.2   1.3725e-10
          0.3   1.4305e-10
          0.4   1.4907e-10
          0.5   1.5532e-10
          0.6   1.6183e-10
... lots more lines here
          9.6   5.8922e-09
          9.7   6.1318e-09
          9.8   6.3811e-09
          9.9   6.6406e-09
           10   6.9106e-09
>> out{11}
ans =
            0            0
          0.1   0.00020176
          0.2   0.00038716
          0.3   0.00055754
          0.4   0.00071407
          0.5   0.00085781
          0.6   0.00098973
          0.7    0.0011107
          0.8    0.0012214
          0.9    0.0013227
            1    0.0014152
          1.1    0.0014994
... lots more lines here
          9.2    0.0027014
          9.3    0.0027099
          9.4    0.0027185
          9.5     0.002727
          9.6    0.0027355
          9.7     0.002744
          9.8    0.0027525
          9.9    0.0027611
           10    0.0027696

Converting the intermediate header data to numeric is trickier because some of them include SI prefixes (e.g. '500m' for 0.5) but one easy approach is to download my FEX submission sip2num and do something like this:

>> mat = cell2mat(cellfun(@sip2num,strrep(hdr(:),'Inf',''),'uni',0))
mat =
            0            1           11
          0.5            2           11
            1            3           11
          1.5            4           11
            2            5           11
          2.5            6           11
            3            7           11
          3.5            8           11
            4            9           11
          4.5           10           11
            5           11           11

5 件のコメント
3 件の古いコメントを表示3 件の古いコメントを非表示

Pratyush Manocha 2020 年 5 月 18 日

編集済み: Pratyush Manocha 2020 年 5 月 18 日

Hi,

Thanks for your answer. While it does seem optimally efficient by invoking the minimum number of reads and data conversions and entirely eliminating the need if a temporary buffer to store the data, it is a tad too complex for me to understand. I have a few questions to ask you regarding the same.

Why did you set the logical value of CollectOutput using a cell structure? Why wouldn't simply assigning false to CollectOutput and passing this as an argument in textscan work?
Why did you pass rt as an argument in the fopen command? What role is it playing?
I am not sure I understand the use of end as an index within the while loop. How does this work? I tried searching the MATLAB documentation but that just showed me its use as a code terminating argument.
If I want to preserve the first header line in order to use as the axes titles or the title of the plot, is there a method to do so? I tried changing end+1 to end+2 in hdr{1,end+1} = fgetl(fid); so that the command became hdr{1,end+2} = fgetl(fid); but that just led to the creation of blank entries.

Could you expand on these points a bit?

Stephen23 2020 年 5 月 18 日

編集済み: Stephen23 2020 年 5 月 18 日

MATLAB Online で開く

1- "Why did you set the logical value of CollectOutput using a cell structure?"

I prefer to define textscan's optional arguments in a cell array like that because:

a. putting all options directly inside textscan on one line tends to create a very long, unwieldy line of code.

b. when experimenting with different options (e.g. to parse your file) it is easier for me to adapt that one cell array at the top of my code. The alternative is to dig down into a very long unwieldy line in the middle of my code, lots of scrolling (left and right as well as up and down). See also point a.

c. It makes it easy to see at a glance what the current options are (see also points a. and b.)

"Why wouldn't simply assigning false to CollectOutput and passing this as an argument in textscan work? "

You could certainly asssign the value false to a variable named CollectOutput, but that would not particularly help with calling textscan, which requires name-value pairs for its optional arguments (i.e. two separate input arguments, the name and the value, exactly as the textscan documentation explains), so you would still need to provide the name separately (the name of that variable is irrelevant).

I find it easier just to use that cell array (which is not a structure).

https://www.mathworks.com/help/matlab/matlab_prog/comma-separated-lists.html

2- "Why did you pass rt as an argument in the fopen command? What role is it playing?"

The input arguments to fopen are explained in the fopen documentation. In short r means that the file is opened for reading (not writing), and t means that the file is interpreted as a text file (not a binary file).

3- "...the use of end as an index within the while loop. How does this work?"

source: https://www.mathworks.com/company/newsletters/articles/matrix-indexing-in-matlab.html

"The special end operator is an easy shorthand way to refer to the last element of v:"

source: https://www.mathworks.com/help/matlab/ref/end.html

"end also represents the last index of an array":

I used it to expand the cell arrays out and hdr for each imported block of data, so that the cell array holds all of the imported data, without knowing how many blocks there are at the start.

4- It is not clear what you mean by "first header line": the very first line of the file, or the header of the first block (the second line in the file) ?

If you want the first line of the file then just assign the fgetl output to a variable:

first = fgetl(fid);

giving:

first =
vds	Id(M1)

If you want the header of the first data block (or any data block for that matter) you don't need to change anything, they are already all stored inside the cell array hdr. Here is the first one:

>> hdr{1}
ans =
Step Information: Vg=0  (Run: 1/11)

https://www.mathworks.com/help/matlab/matlab_prog/access-data-in-a-cell-array.html

J. Alex Lee 2020 年 5 月 18 日

This answer indeed looks better. I didn't realize textscan can be used this way...it fails out gracefully by just not continuing and keeping file pointer where it is.

Pratyush Manocha 2020 年 5 月 18 日

It all makes sense now. Thanks a lot! I'll accept your answer as the best answer.

サインインしてコメントする。

Answer 2

J. Alex Lee 2020 年 5 月 17 日

0
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/526044-reading-data-from-ascii-file#answer_432930

MATLAB Online で開く

It would have been helpful to know these things in advance (you hinted at the first, but say it explicitly)

Data contains multiple header rows for multiple sets of data
Each data set is exactly 101 data points long

Without having known the 2nd detail, here's a script that will split your data - maybe there's more elegant ways (certainly much better ones knowing the exact length of all data sets)

fc = string(fileread('data.txt'));
flines = split(fc,newline);
% remove empty lines
flines(flines=="") = [];
HeaderMask = contains(flines,"Step");
HeaderIdx = find(HeaderMask);
% pad the indices with "ghost" line after last line
HeaderIdx(end+1) = length(flines)+1;
NSets = length(HeaderIdx) - 1;
hdrs = flines(HeaderMask)
for i = NSets:-1:1
	tmp = flines(HeaderIdx(i)+1:HeaderIdx(i+1)-1);
	tmp = str2double(split(tmp,char(9)));
	data{i,1} = tmp;
	hdrs(i,1) = flines(HeaderIdx(i));
end

8 件のコメント
6 件の古いコメントを表示6 件の古いコメントを非表示

J. Alex Lee 2020 年 5 月 18 日

MATLAB Online で開く

In this case, in my inefficient process of reading in the text data into a string, Matlab showed the tab character as a right-arrow in the command line. But as Stephen's answer indicates, I guess you don't necessarily need to know that since textscan can figure it out.

You could also copy-paste the character and do

double(' ')

Pratyush Manocha 2020 年 5 月 19 日

I see. Thanks for clarifying!

サインインしてコメントする。

Reading data from ASCII file

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

採用された回答

5 件のコメント
3 件の古いコメントを表示3 件の古いコメントを非表示

その他の回答 (1 件)

8 件のコメント
6 件の古いコメントを表示6 件の古いコメントを非表示

参考

カテゴリ

タグ

製品

リリース

Community Treasure Hunt

Reading data from ASCII file

0 件のコメント -2 件の古いコメントを表示-2 件の古いコメントを非表示

採用された回答

5 件のコメント 3 件の古いコメントを表示3 件の古いコメントを非表示

その他の回答 (1 件)

8 件のコメント 6 件の古いコメントを表示6 件の古いコメントを非表示

参考

カテゴリ

タグ

製品

リリース

Community Treasure Hunt

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

5 件のコメント
3 件の古いコメントを表示3 件の古いコメントを非表示

8 件のコメント
6 件の古いコメントを表示6 件の古いコメントを非表示