Read fixed file format with multiple new line characters?

1 回表示 (過去 30 日間)
Cameron Snow
Cameron Snow 2014 年 9 月 14 日
編集済み: per isakson 2014 年 9 月 14 日
Hello All,
I am trying to read in a file-type that is supplied by a data vendor (IHS) that has basic header data and oil and gas production data. It is a fixed file format with various new-line characters that indicate what type fo data it is. A small version of this file is copied and pasted at the bottom.
Every well's data will start with "START_US_PROD" followed by an entity number.
++ as new line tells us entity number
+A tells us the state, unique well number, etc
+AT, +AR, +A# all give different properties of the well
+B, +C give information about who drilled and on what lease
+D! gives location information
+F tells us annual production
+G tells us monthly production
Every well will end with "END_US_PROD" followed by the entity number
I would like to read this information into an array or series of arrays, but I can't figure out where to start at all with this type of file.
pre-formatted
IHS Inc. US PRODUCTION DATA 298 1.1 FIXED 2014/09/13 160
START_US_PROD 142020003009
++ 142020003009 Enerdeq
+A TX42037726255KARNES 006385 OI602EDRD 220
+AT02 HERNANDEZ ANDRES4 42255
+AR003009 015874L0066502
+A# EDWARDS
+B KOTARA REGINA ET AL CHEVRON U S A INCORPORATED
+C PANNA MARIA 196204197009 EDWARDS
+D 42255006650000 MULTI A0VI
+D! 28.98917 -97.92547SN
+F 1962 581 0 0
+G 19620131 0 0 0 0 0
+G 19620228 0 0 0 0 0
+G 19620331 0 0 0 0 0
+G 19620430 1279 0 0 1 0
+G 19620531 176 0 0 1 0
+G 19620630 678 0 0 1 0
+G 19620731 855 0 0 1 0
+G 19620831 908 0 0 1 0
+G 19620930 688 0 0 1 0
+G 19621031 630 0 0 1 0
+G 19621130 712 0 0 1 0
+G 19621231 701 0 0 1 0
+F 1963 7208 0 0
+G 19630131 700 0 0 1 0
+G 19630228 709 0 0 1 0
+G 19630331 746 0 0 1 0
+G 19630430 668 0 0 1 0
+G 19630531 638 0 0 1 0
+G 19630630 615 0 0 1 0
+G 19630731 641 0 0 1 0
+G 19630831 636 0 0 1 0
+G 19630930 610 0 0 1 0
+G 19631031 1425 0 0 1 0
+G 19631130 1374 0 0 1 0
+G 19631231 762 0 0 1 0
+L 20140131 ORUN MOBPL
END_US_PROD 142020003009
START_US_PROD 142020003277
++ 142020003277 Enerdeq
+A TX42038652255KARNES 240292 OI602EDRD 220
+AT02 S532 GARY ISAAC 124 42255
+AR003277 015874L0013302
+A# EDWARDS
+B MIKA MARY BLACKBRUSH OIL & GAS LLC
+C PERSON 196203200612 EDWARDS
+D 42255001330000 1 D0VI
+D! 29.01843 -97.85792SN 10875
+E 001 10875 34.0 11117 U 19700101
+E 002 10875 33.0 11117 U 19700201
+E 003 10875 28.0 11117 U 19700301
+E 004 10875 29.0 10655 U 19700601
+E 005 10875 26.0 10655 U 19700901
+E 006 10875 24.0 10655 U 19701001
+E 007 10875 20.0 14400 U 19701201
+E 008 10875 20.0 14400 U 19710101
+E 009 10875 24.0 12792 U 19710401
+E 010 10875 50.0 13380 U 19710601
+E 011 10875 33.0 13380 U 19711101
+E 012 10875 35.0 11886 U 19711201
+E 013 10875 35.0 11886 U 19720101
+E 014 10875 40.0 9850 U 19720301
+E 015 10875 34.0 11588 U 19720601
+E 016 10875 25.0 11588 U 19720701
+E 017 10875 72.0 20847 U 19720801
+E 018 10875 53.0 16075 U 19721201
+E 019 10875 53.0 16075 U 19730101
+E 020 10875 39.0 20000 U 19730601
+E 021 10875 37.0 26071 U 19730901
+E 022 10875 31.0 20161 U 19731201
+E 023 10875 31.0 20161 U 19740101
+E 024 10875 33.0 17030 U 19740301
+E 025 10875 32.0 17000 U 19740501
+E 026 10875 32.0 30 48.4 17000 U 19750101
+E 027 10875 31.0 30 49.2 17000 U 19750601
+E 028 10875 35.0 66 65.3 7142 F 19760426
+E 029 10875 55.2 45 44.9 2264 F 19770301
+E 030 10875 55.2 45 44.9 F 19770601
+E 031 10875 30.0 45 60.0 2233 F 19771001
+E 032 10875 28.8 60 67.6 2191 F 19780201
+E 033 10875 30.6 60 66.2 2156 F 19780427
+E 034 10875 29.7 28 48.5 1448 P 19790426
+E 035 10875 U 19800601
+E 036 10875 39.0 1 P 19800801
+E 037 10875 13.0 1 P 19810206
+E 038 10875 38.0 26 P 19811215
+E 039 10875 17.5 40 69.6 2285 P 19820206
+E 040 10875 24.6 61 71.3 3049 P 19830326
+E 041 10875 19.6 54 73.4 1990 P 19840309
+E 042 10875 54.0 74 57.8 1981 P 19850327
+E 043 10875 24.0 64 72.7 1833 P 19860211
+E 044 10875 12.0 65 84.4 1909 P 19870207
+E 045 10875 18.8 17 36 65.7 904 U 19910313
+E 046 10875 15.8 68 49 75.6 4303 P 19920323
+E 047 10875 15.0 61 80.3 3733 P 19930330
+E 048 10875 13.0 19 59.4 1615 P 19940327
+E 049 10875 15.0 70 10 4666 G 20050320
+D 42255001330000 MULTI A0VI
+D! 29.01843 -97.85792SN
+F 1962 0 0 0
+G 19620131 0 0 0 0 0
+G 19620228 0 0 0 0 0
+G 19620331 1147 0 0 1 0
+G 19620430 2273 0 0 1 0
+G 19620531 2464 0 0 1 0
+G 19620630 2200 0 0 1 0
+G 19620731 2178 0 0 1 0
+G 19620831 2280 0 0 1 0
+G 19620930 2278 0 0 1 0
+G 19621031 2285 0 0 1 0
+G 19621130 2275 0 0 1 0
+G 19621231 2271 0 0 1 0
+F 1963 21651 0 0
+G 19630131 2305 0 0 1 0
+G 19630228 2313 0 0 1 0
+G 19630331 2472 0 0 1 0
+G 19630430 2393 0 0 1 0
+G 19630531 2473 0 0 1 0
+G 19630630 2401 0 0 1 0
+G 19630731 2510 0 0 1 0
+G 19630831 2521 0 0 1 0
+G 19630930 2391 0 0 1 0
+G 19631031 2477 0 0 1 0
+G 19631130 2349 0 0 1 0
+G 19631231 2484 0 0 1 0
+L 20100131 CGRUN REGEF
+L 20100131 ORUN SHLTR
+L 20110131 CGRUN REGEF
+L 20110131 ORUN SHLTR
+L 20120131 CGRUN REGEF
+L 20120131 ORUN SHLTR
+L 20130131 CGRUN REGEF
+L 20130131 ORUN SHLTR
+L 20130831 ORUN UNKWN 37
+L 20140131 CGRUN REGEF
+L 20140131 ORUN SHLTR
END_US_PROD 142020003277

回答 (1 件)

per isakson
per isakson 2014 年 9 月 14 日
編集済み: per isakson 2014 年 9 月 14 日
Matlab is neither good at reading fixed format nor files with multiple blocks of data. It requires a little "program" to read your file.
&nbsp
Proposal:
  • Attach a sample file to the question. Copy&Paste is error prone.
  • Specify (and attach) a template for the result in the form of a struct array, e.g. UsOil, or the data structure you prefer.
UsOil.entity_number
UsOil.unique_well_number
UsOil.state
...
UsOil.annual_production <kx4 double>
UsOil.monthly_production <mx5 double>
...
"+E" &nbsp what's that?
&nbsp
Assuming the entire file fits in memory I would use the approach
  • read the entire file as character, buffer
  • split buffer into blocks (an appropriate job for regexp (/strsplit))
  • loop over all blocks and assign data to the struct, UsOil
A bit like
strncmp({UsOil.state}, 'TX', 2 ) will spot all wells in Texas.
  1 件のコメント
Cameron Snow
Cameron Snow 2014 年 9 月 14 日
I have updated the original post with an attachment. I will try using the buffer and split method suggested.

サインインしてコメントする。

カテゴリ

Help Center および File ExchangeString についてさらに検索

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by