Get data points from one line

Question

0 投票

Hello

I have some data in the form

...
         ->  Parameter number   54 :      Cell_C_ph1_pat1     4.4501491    ( +/-    0.72288327E-04 )
         ->  Parameter number   55 :      Cell_A_ph1_pat1     11.445057    ( +/-    0.16855453E-03 )
         ->  Parameter number   56 :      Cell_B_ph1_pat1     4.1313801    ( +/-    0.61447019E-04 )
         ->  Parameter number   57 :       X-tan_ph1_pat1    0.33901680    ( +/-    0.41584419E-02 )
         ->  Parameter number   58 :      V-Cagl_ph1_pat1   -0.20550521E-01( +/-    0.47112759E-02 )
         ->  Parameter number   59 :      W-Cagl_ph1_pat1    0.20377478E-02( +/-    0.27476726E-03 )
         ->  Parameter number   60 :      U-Cagl_ph1_pat1    0.18869112    ( +/-    0.19129461E-01 )
...

I'm trying to get the values after the name of the parameter(i.e. the 4.4501491 +/- 0.72288327E-04 at Cell_C_ph1_pat1), but i'm struggling a little with the regexp/strfind. At the moment, i have something looking like

buffer = fileread('SnSe_100K_17.out');
substr = '(?<=Cell_C_ph1_pat1\D*)\d*\.?\d+';
numbers = str2double(regexp(buffer,substr,'match'))

This just give the first value of Cell_C(4.4501491), and i would love to also get the error. Actually, if it's possible to get three vectors out - one with the name of the parameter, one with values, and one with errors, it would just be perfect!

I have a lot of data-files in one folder, so i think i would want to make a for-loop, getting all the data from the other files.

0 件のコメント
-2 件の古いコメントを表示 -2 件の古いコメントを非表示

サインインしてコメントする。

サインインしてこの質問に回答する。

Follow Question

Answer 1

Cedric 2017 年 10 月 13 日

編集済み: Cedric 2017 年 10 月 13 日

MATLAB Online で開く

0 投票

That was a good attempt, but for this you should use tokens:

 buffer  = fileread( 'SnSe_100K_17.out' ) ;
 pattern = ':\s+(\S+)\s+([^\(]+)\D+(\S+)' ;
 tokens  = regexp( buffer, pattern, 'tokens' ) ;
 tokens  = vertcat( tokens{:} ) ;
 names   = tokens(:,1) ;
 data    = str2double( tokens(:,2:3) ) ;

Using tokens is roughly the same as matching the normal way, you define a pattern that matches the full block that contains what you need to extract, but then you frame the parts that you want to be extracted specifically within () in the pattern. These parts are called tokens.

The pattern that I built here is based on the following observations:

Parameter names are always separated from their surrounding by white spaces and they do not contain white spaces, hence the \S+ to match them.
Values cannot be matched the same way, because there are cases where they touch the opening parenthesis associated with the error, so we can match them using [^\(]+ (one or more char that is not an opening parenthesis).
What is within values and errors contain no numeric digit and symmetrical errors don't need a sign, so we can "eat the string" after the value as long as characters are not numeric digits, hence the \D+.
Errors seem to be followed by a white space, so \S+ to get them.

With that, you get:

 >> names
 names =
  7×1 cell array
    {'Cell_C_ph1_pat1'}
    {'Cell_A_ph1_pat1'}
    {'Cell_B_ph1_pat1'}
    {'X-tan_ph1_pat1' }
    {'V-Cagl_ph1_pat1'}
    {'W-Cagl_ph1_pat1'}
    {'U-Cagl_ph1_pat1'}
 >> data
 data =
    4.4501    0.0001
   11.4451    0.0002
    4.1314    0.0001
    0.3390    0.0042
   -0.0206    0.0047
    0.0020    0.0003
    0.1887    0.0191

2 件のコメント
なしを表示なしを非表示

Anders Bennedsgaard 2017 年 10 月 13 日

MATLAB Online で開く

Works almost like a charm! But just to be annoying, could you fix a little problem this introduces?

The start of my data looks like this

     'SYMBOLIC NAMES AND FINAL VALUES AND SIGMA OF REFINED PARAMETERS:
          -----------------------------------------------------------------
           ->  Parameter number    1 :       Scale_ph1_pat1    0.21361238E-03( +/-    0.62756760E-06 )
           ->  Parameter number    2 :            Zero_pat1   -0.13230542E-01( +/-    0.30673487E-03 )
           ->  Parameter number    3 :           Bck_0_pat1     8039.8750    ( +/-     89.272911     )
           ->  Parameter number    4 :           Bck_1_pat1     8023.1499    ( +/-     96.739319     )
           ->  Parameter number    5 :           Bck_2_pat1     7929.5889    ( +/-     103.76885     )
           ->  Parameter number    6 :           Bck_3_pat1     7919.5361    ( +/-     96.303169     )
...

which gives

          NaN   6.2757e-07
    -0.013231   0.00030673
       8039.9       89.273
       8023.1       96.739
       7929.6       103.77
       7919.5       96.303
...

One solution would probably be to just remove the first two lines, but if you could fix it with your code, it would be nice

Cedric 2017 年 10 月 13 日

編集済み: Cedric 2017 年 10 月 13 日

MATLAB Online で開く

Add a space before the : in the pattern .. not that annoying, I've seen worse ;)

pattern = ' :\s+(\S+)\s+([^\(]+)\D+(\S+)' ;

If parameter numbers are right justified there will be no problem. If not, we just have to capture a white space or a numeric digit before the column:

pattern = '[\d\s]:\s+(\S+)\s+([^\(]+)\D+(\S+)' ;

サインインしてコメントする。

Get data points from one line

0 件のコメント
-2 件の古いコメントを表示 -2 件の古いコメントを非表示

採用された回答

2 件のコメント
なしを表示なしを非表示

その他の回答 (0 件)

カテゴリ

タグ

Community Treasure Hunt

Get data points from one line

0 件のコメント -2 件の古いコメントを表示 -2 件の古いコメントを非表示

採用された回答

2 件のコメント なしを表示 なしを非表示

その他の回答 (0 件)

カテゴリ

タグ

参考

Community Treasure Hunt

0 件のコメント
-2 件の古いコメントを表示 -2 件の古いコメントを非表示

2 件のコメント
なしを表示なしを非表示