Read file with non-uniform lines?

Question

bene1 2020 年 10 月 25 日

0
リンク

この質問への直接リンク

https://jp.mathworks.com/matlabcentral/answers/625818-read-file-with-non-uniform-lines

コメント済み: bene1 2020 年 10 月 27 日

Hi. I'm a Matlab newbie. I would like to read in a file where the lines have different formats, as below.

% Coordinates
%   Code    ID      X         Y
    C       101     0.001     0.001
    C       102     1.002     0.002
    C       103     1.003     1.003
    C       104     0.004     1.004
% Distances
%   Code    ID      From      To      Dist
    D       201     101       103     1.417
    D       202     102       104     1.414

If the first character is C, use...

A = textscan(fid,'%c %d %f %f')

If the first character is D, use...

A = textscan(fid,'%c %d %d %d %f')

After, I'd like to assign the data to structs (c.id, c.x, c.y, d.id, d.from, d.to, d.dist), but first I think I just need to get it scanned in. Is it possible to apply some logic to reading the file? Thank you.

5 件のコメント
3 件の古いコメントを表示3 件の古いコメントを非表示

Walter Roberson 2020 年 10 月 26 日

MATLAB Online で開く

'^\s*C.*$', 'dotexceptnewline', 'lineachors'

or

'(?<=(^|\n))\s*C[^\n]*'

with no additional options needed

bene1 2020 年 10 月 26 日

MATLAB Online で開く

Great, thanks again. Now have...

C =
  4×1 cell array
    {'    C       101     0.001     0.001←'}
    {'    C       102     1.002     0.002←'}
    {'    C       103     1.003     1.003←'}
    {'    C       104     0.004     1.004←'}

With C as a 4x1, I believe my next step is to extract out the columns. My first thought was

A = textscan(C,'%c %d %f %f')

but I see I can't do that. Looking into cell2struct?

サインインしてコメントする。

サインインしてこの質問に回答する。

Answer 1

Walter Roberson 2020 年 10 月 26 日

0
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/625818-read-file-with-non-uniform-lines#answer_524468

MATLAB Online で開く

Named tokens, I said. Do not extract the lines ahead of time.

FileText = fileread(YourFileName);
Ctokens = regexp(FileText, '^\s*C\s+(?<ID>\d+)\s+(?<X>\S+)\s+(?<Y>\S+)', 'names', 'lineanchors');
%Ctokens will now be a struct array with field names ID, X, and Y, each of which are character vectors.
C.ID = str2double({Ctokens.ID});
C.X = str2double({Ctokens.X});
C.Y = str2double({Ctokens.Y});
Dtokens = regexp(FileText, '^\s*D\s+(?<ID>\d+)\s+(?<From>\d+)\s+(?<To>\d+)\s+(?<Dist>\S+)', 'names', 'lineanchors');
%Dtokens will now be a struct array with field names ID, From, To, Dist, each of which are character vectors.
D.ID = str2double({Dtokens.ID});
D.From = str2double({Dtokens.From});
D.To = str2double({Dtokens.To});
D.Dist = str2double({Dtokens.Dist});

Amount of processing work is pretty minimial. Pretty much all of the effort is in figuring out the proper regexp patterns to use (which can be pretty tricky when there are variant lines.)

1 件のコメント
-1 件の古いコメントを表示-1 件の古いコメントを非表示

bene1 2020 年 10 月 27 日

Cool, thank you kindly!

サインインしてコメントする。

Answer 2

per isakson 2020 年 10 月 26 日

0
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/625818-read-file-with-non-uniform-lines#answer_524413

MATLAB Online で開く

>> S = cssm( 'd:\m\cssm\cssm.txt' )
S = 
  1×2 struct array with fields:
    header
    colhead
    Code
    data
>> S(1)
ans = 
  struct with fields:
     header: "Coordinates"
    colhead: ["Code"    "ID"    "X"    "Y"]
       Code: [4×1 string]
       data: [4×3 double]
>> S(2)
ans = 
  struct with fields:
     header: "Distances"
    colhead: ["Code"    "ID"    "From"    "To"    "Dist"]
       Code: [2×1 string]
       data: [2×4 double]

where

function    sas = cssm( ffs )
    
    chr = fileread( ffs );
    str = string( chr );
    str = replace( str, char([13,10]), newline );   % get rid of the carriage return
   
    % split the string into blocks. Use the block header as delimiter. 
    [blk,del] = strsplit( str, '(?m)^\x20*%\x20\w+\x20*\n'  ...      
                        , 'DelimiterType','RegularExpression' );
                    
    blk(1) = [];  % remove empty block before the first delimiter                    
    
    len = numel( del );
    sas(1,len) = struct( 'header',"", 'colhead',"", 'Code',"", 'data',nan );
    
    for jj = 1 : len    % loop over all blocks
        
        sas(jj).header = regexp( del(jj), '\w+', 'match','once' );  % match the name
        
        cac = textscan( blk(jj), "%[^\n]", 1 ); % read the first row
        tmp = strsplit( string(cac{1}) );       % split the row into column headers
        tmp(1) = [];                            % remove the comment character, "%"
        sas(jj).colhead = tmp;
        
        cac = textscan( blk(jj), ['%s',repmat('%f',1,numel(tmp)-1)] ...
                    ,   'Headerlines',1, 'CollectOutput',true );
        sas(jj).Code = string(cac{1});
        sas(jj).data = cac{2};
    end
end

and where cssm.txt contains the data given in of your question.

1 件のコメント
-1 件の古いコメントを表示-1 件の古いコメントを非表示

bene1 2020 年 10 月 27 日

Thank you for the idea. :-)

サインインしてコメントする。

Read file with non-uniform lines?

5 件のコメント
3 件の古いコメントを表示3 件の古いコメントを非表示

採用された回答

1 件のコメント
-1 件の古いコメントを表示-1 件の古いコメントを非表示

その他の回答 (1 件)

1 件のコメント
-1 件の古いコメントを表示-1 件の古いコメントを非表示

参考

カテゴリ

タグ

製品

リリース

Community Treasure Hunt

Read file with non-uniform lines?

5 件のコメント 3 件の古いコメントを表示3 件の古いコメントを非表示

採用された回答

1 件のコメント -1 件の古いコメントを表示-1 件の古いコメントを非表示

その他の回答 (1 件)

1 件のコメント -1 件の古いコメントを表示-1 件の古いコメントを非表示

参考

カテゴリ

タグ

製品

リリース

Community Treasure Hunt

5 件のコメント
3 件の古いコメントを表示3 件の古いコメントを非表示

1 件のコメント
-1 件の古いコメントを表示-1 件の古いコメントを非表示

1 件のコメント
-1 件の古いコメントを表示-1 件の古いコメントを非表示