Skipping characters in a data file while keeping numbers

Question

0 投票

I have a large database of .txt files that are giving me some problems when I try to read them in. I have looked at the official MATLAB help pages (for those who are just going to refer me to those) and I have not made too much progress. The format of the text files is below. The data that I need is in columns 2,3,6,7. I have tried using textscan and textopen to read in the whole file but only column 1 gets imported "IO" and the rest show up as [] with no data. The other problem that I foresee is removing the characters from columns 6 'N' and 7 'E' while still retaining the numbers. Any advice would be greatly appreciated, thanks!

IO, 02, 1951061018, , BEST, 0, 182N, 682E,-999

IO, 02, 1951061100, , BEST, 0, 187N, 681E,-999

IO, 02, 1951061106, , BEST, 0, 192N, 679E,-999

IO, 02, 1951061112, , BEST, 0, 197N, 676E,-999

IO, 02, 1951061118, , BEST, 0, 203N, 674E,-999

IO, 02, 1951061200, , BEST, 0, 208N, 671E,-999

0 件のコメント
-2 件の古いコメントを表示 -2 件の古いコメントを非表示

サインインしてコメントする。

サインインしてこの質問に回答する。

Follow Question

Answer 1

per isakson 2012 年 6 月 8 日

MATLAB Online で開く

0 投票

--- R2012a ---

'ascii_letters_numbers.txt' contains your sample text. This code reads and converts the text according to the format string, frm. Don't ask me why "E" ends up as "e".

The problem is to get the format string right.

    %IO, 02, 1951061200, , BEST, 0, 208N, 671E,-999
    frm = '%s%f%f%s%s%f%f%c%f%c%f';
    fid = fopen( 'ascii_letters_numbers.txt', 'r' ); 
    cac = textscan( fid, frm, 'Delimiter', ',' );  
    fclose( fid );
    >> cac{10}
    ans =
    e
    e
    e
    e
    e
    e
    >>

--- Skipping characters ---

You modify the format string to skip columns. To skip #column 1,4,5 and the last use

frm = '%*s%f%f%*s%*s%f%f%c%f%c%*f';

2 件のコメント
なしを表示なしを非表示

Walter Roberson 2012 年 6 月 8 日

I get a different result in R2008b:

fid = 'IO, 02, 1951061200, , BEST, 0, 208N, 671E,-999';

frm = '%s%f%f%s%s%f%f%c%f%c%f';

textscan( S, frm, 'Delimiter', ',' )

ans =

{1x1 cell} [2] [1951061200] {1x1 cell} {1x1 cell} [0] [208] 'N' [671] '-' [999]

Notice the 'e' looks like it has been eaten. That is because 'E' can validly appear in that position in a number such as 671E0, so the parser consumes the E and moves on to the next character, the comma, which is the delimiter. It eats the delimiter and then processes the %c, getting the '-'. Possibly in later versions it backs up to the 'E' but internally has converted it to 'e' already (thinking it was an exponent.) Sounds like a bug to me ;-)

In R2008b even if I use %d (which should be integer) it still eats the E but not the N. I would term that a bug, myself:

textscan('35E3','%d')

ans =

[35000]

per isakson 2012 年 6 月 8 日

@Walter, it seems plausible that it is a bug. It's remarkable how difficult it is to write high quality code. The software industry have work with code to read text like this for 50+ years. Yes, it's a bit sloppy to use %f if one know that it is whole numbers in the file.

サインインしてコメントする。

Skipping characters in a data file while keeping numbers

0 件のコメント
-2 件の古いコメントを表示 -2 件の古いコメントを非表示

採用された回答

2 件のコメント
なしを表示なしを非表示

その他の回答 (0 件)

カテゴリ

タグ

Community Treasure Hunt

Skipping characters in a data file while keeping numbers

0 件のコメント -2 件の古いコメントを表示 -2 件の古いコメントを非表示

採用された回答

2 件のコメント なしを表示 なしを非表示

その他の回答 (0 件)

カテゴリ

タグ

参考

Community Treasure Hunt

0 件のコメント
-2 件の古いコメントを表示 -2 件の古いコメントを非表示

2 件のコメント
なしを表示なしを非表示