Skipping characters in a data file while keeping numbers

I have a large database of .txt files that are giving me some problems when I try to read them in. I have looked at the official MATLAB help pages (for those who are just going to refer me to those) and I have not made too much progress. The format of the text files is below. The data that I need is in columns 2,3,6,7. I have tried using textscan and textopen to read in the whole file but only column 1 gets imported "IO" and the rest show up as [] with no data. The other problem that I foresee is removing the characters from columns 6 'N' and 7 'E' while still retaining the numbers. Any advice would be greatly appreciated, thanks!
IO, 02, 1951061018, , BEST, 0, 182N, 682E,-999
IO, 02, 1951061100, , BEST, 0, 187N, 681E,-999
IO, 02, 1951061106, , BEST, 0, 192N, 679E,-999
IO, 02, 1951061112, , BEST, 0, 197N, 676E,-999
IO, 02, 1951061118, , BEST, 0, 203N, 674E,-999
IO, 02, 1951061200, , BEST, 0, 208N, 671E,-999

 採用された回答

per isakson
per isakson 2012 年 6 月 8 日

0 投票

--- R2012a ---
'ascii_letters_numbers.txt' contains your sample text. This code reads and converts the text according to the format string, frm. Don't ask me why "E" ends up as "e".
The problem is to get the format string right.
%IO, 02, 1951061200, , BEST, 0, 208N, 671E,-999
frm = '%s%f%f%s%s%f%f%c%f%c%f';
fid = fopen( 'ascii_letters_numbers.txt', 'r' );
cac = textscan( fid, frm, 'Delimiter', ',' );
fclose( fid );
>> cac{10}
ans =
e
e
e
e
e
e
>>
--- Skipping characters ---
You modify the format string to skip columns. To skip #column 1,4,5 and the last use
frm = '%*s%f%f%*s%*s%f%f%c%f%c%*f';

2 件のコメント

Walter Roberson
Walter Roberson 2012 年 6 月 8 日
I get a different result in R2008b:
fid = 'IO, 02, 1951061200, , BEST, 0, 208N, 671E,-999';
frm = '%s%f%f%s%s%f%f%c%f%c%f';
textscan( S, frm, 'Delimiter', ',' )
ans =
{1x1 cell} [2] [1951061200] {1x1 cell} {1x1 cell} [0] [208] 'N' [671] '-' [999]
Notice the 'e' looks like it has been eaten. That is because 'E' can validly appear in that position in a number such as 671E0, so the parser consumes the E and moves on to the next character, the comma, which is the delimiter. It eats the delimiter and then processes the %c, getting the '-'. Possibly in later versions it backs up to the 'E' but internally has converted it to 'e' already (thinking it was an exponent.) Sounds like a bug to me ;-)
In R2008b even if I use %d (which should be integer) it still eats the E but not the N. I would term that a bug, myself:
textscan('35E3','%d')
ans =
[35000]
per isakson
per isakson 2012 年 6 月 8 日
@Walter, it seems plausible that it is a bug. It's remarkable how difficult it is to write high quality code. The software industry have work with code to read text like this for 50+ years. Yes, it's a bit sloppy to use %f if one know that it is whole numbers in the file.

サインインしてコメントする。

その他の回答 (0 件)

カテゴリ

ヘルプ センター および File ExchangeLarge Files and Big Data についてさらに検索

タグ

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by