How to extract specific rows & columns from a text file

Question

Farhan K 2020 年 3 月 25 日

0
リンク

この質問への直接リンク

https://jp.mathworks.com/matlabcentral/answers/512983-how-to-extract-specific-rows-columns-from-a-text-file

編集済み: per isakson 2020 年 3 月 27 日

I need to extract all the net names and toggle rates from a text file. total 179456 nets. Below screenshot is the few row and columns shown as example

3 件のコメント
1 件の古いコメントを表示1 件の古いコメントを非表示

Farhan K 2020 年 3 月 26 日

I provided a screenshot as example

Farhan K 2020 年 3 月 26 日

Example.txt

Here is an example

サインインしてコメントする。

サインインしてこの質問に回答する。

Answer 1

per isakson 2020 年 3 月 26 日

1
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/512983-how-to-extract-specific-rows-columns-from-a-text-file#answer_421968

編集済み: per isakson 2020 年 3 月 26 日

MATLAB Online で開く

See Read formatted data from text file or string and try

fileID = fopen(filename);
net_names = textscan( fileID, '%s%*f%*f%f%*f', 'Headerlines',3 );
fclose(fileID);

Added later. The script

%%
filename = 'd:\m\cssm\Example.txt';
fileID = fopen(filename);
net_names = textscan( fileID, '%s%*f%*f%f%*f', 'Headerlines',4 );
fclose(fileID);
%%
net_names{1}(1:3,1)
net_names{2}(1:3,1)

outputs

ans =
  3×1 cell array
    {'AES/r8/t0/t3/s4/n885'}
    {'AES/r3/p21[25]'      }
    {'AES/r7/z0[6]'        }
ans =
       0.0001
       0.0001
       0.0001

7 件のコメント
5 件の古いコメントを表示5 件の古いコメントを非表示

per isakson 2020 年 3 月 26 日

編集済み: per isakson 2020 年 3 月 27 日

MATLAB Online で開く

"All table variables must have the same number of rows." tells me that textscan() fails to read your large file to its end. The reason is most likely that there is a line in your data file that doesn't match the format string. (I think readtable() will encounter the same problem. Might differ between releases. I run R2018b)

I reproduced your error by modifying line 44 of Example.txt to

AES/s3[124] 16.927 0.496 whats wrong

Next step is to add 'ReturnOnError' to the call of textscan()

net_names = textscan( fileID, '%s%*f%*f%f%*f'   ...
                    , 'Headerlines',4           ...
                    , 'ReturnOnError', false    ...    
                );

Now I get the error message

Error using textscan
Mismatch between file and format character vector.
Trouble reading 'Numeric' field from file (row number 40, field number 4) ==> whats       wrong\n
Error in cssm (line 4)
net_names = textscan( fileID, '%s%*f%*f%f%*f'   ...

which tells me that the two words, 'whats' and 'wrong', cause the error.

The cure to this error is to add 'TreatAsEmpty' to the call of textscan()

net_names = textscan( fileID, '%s%*f%*f%f%*f'               ...
                    , 'Headerlines',    4                   ...
                    , 'ReturnOnError',  false               ...
                    , 'TreatAsEmpty',   {'whats','wrong'}   ...
                );            

Problem solved. textscan reads the modified Example.txt without problems. The words, 'whats' and 'wrong', are converted to nan.

.

So far I've missed your attachment.

When browsing switching_prob_report.txt in the free editor notepad++ , I quickly spotted problematic lines at the end of the file. (There might be more.)

...
Trigger/n7                 2.780    0.000    0.49e-7  9.772e-08  
AES/*Logic0*             288.088    0.000    0.0000      0.0000  d
AES/*Logic1*              71.396    1.000    0.0000      0.0000  d
Tj_Trig                    7.086    0.000    0.0000      0.0000  
...
Trojan/lfsr/n25            2.489    0.000    0.0000      0.0000  
--------------------------------------------------------------------------------
Total (179456 nets)                                         24.1493 uW
1

We can solve the problem with the extra column containing the letter 'd' by modifying the format string to skip to the end of line after the 'Togglerates' column.

%%
filename = 'd:\m\cssm\switching_prob_report.txt';
fileID = fopen(filename);
net_names = textscan( fileID, '%s%*f%*f%f%*[^\n]'   ...
                    , 'Headerlines',    612         ...
                    , 'ReturnOnError',  true        ...
                );
fclose(fileID);

reads all data lines and fails (as expected) on the three last lines.

>> net_names{1}(end-4:end)
ans =
  5×1 cell array
    {'Trojan/lfsr/n21'                                                                 }
    {'Trojan/lfsr/n24'                                                                 }
    {'Trojan/lfsr/n25'                                                                 }
    {'--------------------------------------------------------------------------------'}
    {'Total'                                                                           }
>> net_names{2}(end-4:end)
ans =
     0
     0
     0
     0
   NaN
>>

The first column has one element more than the second. Without exactly understanding this result, I delete the elements, which originate in the footer.

%%
net_names{1}(end-1:end)= [];
net_names{2}(end) = [];

Farhan K 2020 年 3 月 27 日

wow! This is why I preferred to use MATLAB rather than using Python. Excellent help

サインインしてコメントする。

Answer 2

Akira Agata 2020 年 3 月 26 日

1
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/512983-how-to-extract-specific-rows-columns-from-a-text-file#answer_421974

編集済み: Akira Agata 2020 年 3 月 26 日

MATLAB Online で開く

Thank you for uploading an example. How about the following?

% Read from text data
T = readtable('Example.txt','HeaderLines',4,'Format','%s%f%f%f%f');
T.Properties.VariableNames = {'Net','NetLoad','StaticProb','ToggleRate','SwPower'};
% Extract NetLoad and ToggleRate
T = T(:,{'Net','ToggleRate'});

4 件のコメント
2 件の古いコメントを表示2 件の古いコメントを非表示

Farhan K 2020 年 3 月 26 日

In this file, if you go down, then you can find the Net and Toggle rate section with total 179456 nets

Akira Agata 2020 年 3 月 26 日

編集済み: Akira Agata 2020 年 3 月 26 日

MATLAB Online で開く

Thank you for attaching the data.

OK. Looking at your data, I found some irregular lines and needs to some pre-processing.

How about the following?

% Read from text data
C = readcell('switching_prob_report.txt','Delimiter','\r\n','NumHeaderLines',612);
% Delete the last 3 lines (because they don't contain data)
C(end-2:end) = [];
% Detect irregular lines (which does not end with number)
idx = ~endsWith(C,compose('%d',0:9));
% Delete characters at the end of the detected lines
C(idx) = regexprep(C(idx),'\s*[^\d]$','');
% Split the line
D = split(C);
% Extract 1st and 4th column
Net = D(:,1);
ToggleRate = str2double(D(:,4));

サインインしてコメントする。

How to extract specific rows & columns from a text file

3 件のコメント
1 件の古いコメントを表示1 件の古いコメントを非表示

採用された回答

7 件のコメント
5 件の古いコメントを表示5 件の古いコメントを非表示

その他の回答 (1 件)

4 件のコメント
2 件の古いコメントを表示2 件の古いコメントを非表示

参考

カテゴリ

タグ

Community Treasure Hunt

How to extract specific rows & columns from a text file

3 件のコメント 1 件の古いコメントを表示1 件の古いコメントを非表示

採用された回答

7 件のコメント 5 件の古いコメントを表示5 件の古いコメントを非表示

その他の回答 (1 件)

4 件のコメント 2 件の古いコメントを表示2 件の古いコメントを非表示

参考

カテゴリ

タグ

Community Treasure Hunt

3 件のコメント
1 件の古いコメントを表示1 件の古いコメントを非表示

7 件のコメント
5 件の古いコメントを表示5 件の古いコメントを非表示

4 件のコメント
2 件の古いコメントを表示2 件の古いコメントを非表示