Efficiently import text file with irregular struture

Question

0 投票

example.txt

I have data like the sample attached. I'd like to efficiently read it in and ultimately have a table with the following columns: Value1, Value2, Header (note that header is repeated for many pairs; this is to facilite GroupBy sorting) ... example desired output below

I'm not sure where to start and would appreciate any help. Using Matlab R2018a

Ex. desired output:

Val1	Val2	info	
32	32.8	'Header i do need #1'	
33	32.68	'Header i do need #1'	
05	32.73	'Header i do need #1'	
71	32.9	'Header i do need #1'	
71	32.71	'Header i do need #1'	
etc.	etc.	etc.	
57	32.41	'Header i do need #2'	
43	32.66	'Header i do need #2'	
32.27	'Header i do need #2'	
05	32.27	'Header i do need #2'	
13	32.37	'Header i do need #2'	
etc.	etc.	etc.	
49	32.35	'Header i do need #3'	
84	32.17	'Header i do need #3'	
83	32.16	'Header i do need #3'	
07	32.44	'Header i do need #3'	
66	32.77	'Header i do need #3'	
etc.	etc.	etc.	

0 件のコメント
-2 件の古いコメントを表示 -2 件の古いコメントを非表示

サインインしてコメントする。

サインインしてこの質問に回答する。

Follow Question

Answer 1

per isakson 2019 年 3 月 12 日

編集済み: per isakson 2019 年 3 月 13 日

MATLAB Online で開く

0 投票

See Import Block of Numeric Data from Text File and How do I parse this complex text file with textscan?

If you need further help, please show a three line example of the output you want from example.txt

Working code

Assumptions

The text file fits in memory
The first line with "*" in the first position indicates the beginning of data
Lines beginning with "#" (in the data part) indicate the start of a block

Approach

Read the entire file into a character array.
Split the array into a cell array of characters, with one block in each cell
Loop over all blocks, parse the blocks and put the result in a cell array
Pre-allocate output variables based on the size of the the cell array and contained data
Loop over all blocks and put the data into a table

>> T = cssm( 'h:\m\cssm\example.txt' )
T =
  76×3 table
    Var1     Var2            info       
    _____    _____    __________________
    27.32     32.8    "header 1 do need"
    27.33    32.68    "header 1 do need"
    27.05    32.73    "header 1 do need"
    ...
    27.53    32.66    "header 2 do need"
    27.68    32.98    "header 2 do need"
    27.77    32.27    "header 2 do need"
    27.49    32.35    "header 3 do need"
    27.84    32.17    "header 3 do need"
    27.83    32.16    "header 3 do need"

where

function    T = cssm( ffs )
    str = fileread( ffs );                  % read the entire file
    ixs = find( str=='*', 1,'first' ) +1;   % find first position of interest
    str = str( ixs : end );                 % strip off leading comments
    % split the text array into blocks
    [ blocks, matches ] = strsplit( str, '(?m)^#[^\r\n]*'       ...
                        ,   'DelimiterType','RegularExpression' );
    blocks(1) = [];    % delete whatsever before the first block header 
    
    % read the blocks of text
    len = length( blocks );
    num = cell( len, 2 );
    for jj = 1 : len
        num(jj,:) = textscan( blocks{jj}, '%f%f' ); 
    end
    
    heights = cellfun( @numel, num(:,1) );
    
    % preallocate a table
    T = table(  'Size'          , [sum( heights ),3]            ...
            ,   'VariableTypes' , {'double','double','string'}  ...
            ,   'VariableNames' , {'Var1','Var2','info'}        );
    % add data to table
    ix1 = 1;
    for jj = 1 : len
        ix2 = ix1 + heights(jj) - 1;
        T.Var1(ix1:ix2) = num{jj,1};
        T.Var2(ix1:ix2) = num{jj,2};
        T.info(ix1:ix2) = repmat( string(matches{jj}(3:end)), heights(jj),1 );
        ix1 = ix2 + 1;
    end
end

2 件のコメント
なしを表示なしを非表示

newbie9 2019 年 3 月 12 日

thank you @per isakson. I am looking at the pages to which you linked and am still a little stuck--I have added an example desired output

per isakson 2019 年 3 月 13 日

編集済み: per isakson 2019 年 3 月 13 日

I added a working code to the answer. Note that I modified the info-texts in the text file, example.txt.

サインインしてコメントする。

Efficiently import text file with irregular struture

0 件のコメント
-2 件の古いコメントを表示 -2 件の古いコメントを非表示

回答 (1 件)

2 件のコメント
なしを表示なしを非表示

カテゴリ

タグ

Community Treasure Hunt

Efficiently import text file with irregular struture

0 件のコメント -2 件の古いコメントを表示 -2 件の古いコメントを非表示

回答 (1 件)

2 件のコメント なしを表示 なしを非表示

カテゴリ

タグ

参考

Community Treasure Hunt

0 件のコメント
-2 件の古いコメントを表示 -2 件の古いコメントを非表示

2 件のコメント
なしを表示なしを非表示