extract useful info from text file filled with irrelevant info

9 ビュー (過去 30 日間)

Jamie Shelley 2016 年 7 月 26 日

0
リンク

この質問への直接リンク

https://jp.mathworks.com/matlabcentral/answers/297250-extract-useful-info-from-text-file-filled-with-irrelevant-info

コメント済み: Jamie Shelley 2016 年 8 月 7 日

The text files that I have are like this: useless info... * useful date (within useless info) useful info *useless info and like that until the end - if there a way of extracting the useful stuff rather than copy and pasting each individual one (as each file has like 200 things to copy into excel) ? Thanks

13 件のコメント
11 件の古いコメントを表示11 件の古いコメントを非表示

dpb 2016 年 7 月 27 日

In one form or another, yes. "To repeat an operation, it is necessary to repeat an operation!" :)

It's quite possible you could read the whole file into memory and do string comparison to find the row indices of all the headers in "one swell foop" but then you would need to iterate over those altho again one could likely be able to delete large chunks between those locations leaving only the sections of interest to process, but still in the end one would needs must iterate, yes. Now, again, with Matlab you may be able to use things like cellfun or the like and mask the underlying looping at higher levels, but it reduces to a for...end loop under the hood eventually. Often it's far simpler at least initially to just use the "deadahead" approach and use the explicit loop and only after it's shown to be too slow worry excessively about vectorizing. And, often with these more complex cases the vectorized solution actually doesn't outperform the loop and is more difficult to write, debug and maintain.

Jamie Shelley 2016 年 7 月 29 日

Okay thank you - I think I see what you mean, is there a way of finding the number of paragraphs/lines in the file please?

dpb 2016 年 7 月 29 日

編集済み: dpb 2016 年 7 月 29 日

Sure, but you probably don't need to in order to extract the data of interest.

It would really, Really, REALLY help if you would include an actual file section if you want actual answers instead of continued generalities, but look at the code I provided another poster just recently parsing a similar kind of file to see how to use what's in the file to locate sections of interest... <import-from-data-from-messy-text-file>. That particular file had the nicety of having the number of cases as a parseable value early on in the file so could use a counted outside loop; if, as I gather, your file wouldn't have such just use a while ~feof(fid) loop or equivalent instead. Also, of course, you'd have to keep track/discern which is the one of interest initially and either skip one first or vice versa, depending on the order of whether it's the even/odd case you're interested in. But, as the above shows, it's really pretty simple concept, just takes some consideration of what it is that can be looked for.

サインインしてコメントする。

サインインしてこの質問に回答する。

採用された回答

Guillaume 2016 年 8 月 6 日

0
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/297250-extract-useful-info-from-text-file-filled-with-irrelevant-info#answer_231011

編集済み: Guillaume 2016 年 8 月 6 日

MATLAB Online で開く

Now that we finally know what useful and useless look like, we can finally answer the question (mostly).

Here is one way to extract the LPR section(s):

filecontent = fileread('example2.txt');
filesections = regexp(filecontent, 'File name.*?(?=(File name)|$)', 'match');  %match 'File name' and everything that follows up to the next 'File name' or the end of string.
testtypes = regexp(filesections, '(?<=Test type\s*)\S+', 'match', 'once'); %match non-blank characters after 'Test type'
wantedsections = filesections(strcmp(testtypes, 'LPR'));

edit: missing ) in first regex

11 件のコメント
9 件の古いコメントを表示9 件の古いコメントを非表示

Guillaume 2016 年 8 月 7 日

MATLAB Online で開く

You could extract the time with another regular expression:

sectiontimes = regexp(wantedsections, '(?<=Time and Date\s*)[^\n\r]*', 'match', 'once');
timevalues = datetime(sectiontimes, 'InputFormat', 'HH:mm:ss dd/MMM/yyyy'); %optional

Assuming the table you want is always surrounding by blank lines (which actually contain spaces or tabs), you could also isolate it with another regexp:

sectiontables = regexp(wantedsections, '(?<=\r\n[ \t]*\r\n).*?(?=($|\r\n[ \t]*\r\n))', 'match', 'once');

It's then a simple matter of using textscan:

tablevalues= cellfun(@(s) cell2mat(textscan(s, '%f %f %f %f %f', 'CollectOutput', true, 'HeaderLines', 1)), sectiontables, 'UniformOutput', false)

Alternatively, since the EIS sections all appear to have the same format, you could simply split them into lines and extract the lines of interest:

splitsections = regexp(wantedsections, '\r\n', 'split');
sectiostimes = cellfun(@(s) s{7}(15:34), splitsections, 'UniformOutput', false);
sectiontables = cellfun(@(s) strjoin(s(22:end), '\n'), splitsections, 'UniformOutput', false);

Jamie Shelley 2016 年 8 月 7 日

In the excel file, the times at the bottom aren't neccesary as I programmed them in differently, I just need to get the test results into the layout shown in the example.xlsx file (but times there aren't neccessary), is there any way of extracting the tables from single cells into the format in the excel file please? If not, I'll start doing it all manually but I feel like it's so close to being finished that it would be a shame to start doing it manually now. Thanks

Jamie Shelley 2016 年 8 月 7 日

This has been the longest weekend ever, but it's finally done (more or less). Thanks for the help with the really technical stuff, I had no idea how to use regexp and that stuff.

サインインしてコメントする。

その他の回答 (1 件)

Shameer Parmar 2016 年 7 月 29 日

0
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/297250-extract-useful-info-from-text-file-filled-with-irrelevant-info#answer_230160

MATLAB Online で開く

use this command..

Data = textread('FileName.txt', '%s', 'delimiter', '');

then apply the logic (FOR and IF loop) according to your requirement for reading and storing the data..

Please provide the data of your text file and about the required data so that I can help you for logic..

3 件のコメント
1 件の古いコメントを表示1 件の古いコメントを非表示

Guillaume 2016 年 8 月 6 日

At last! We finally get an example of the data as dpb and I have been asking for ages. You still haven't given us the full picture, but can still make a start at answering the question.

Jamie Shelley 2016 年 8 月 6 日

Sorry, I didn't have a copy of it on me, But I've got one now - it's just basically that but continuously for however many experiments were done - Thanks

サインインしてコメントする。

サインインしてこの質問に回答する。

カテゴリ

AI and Statistics Text Analytics Toolbox Text Data Preparation

Help Center および File Exchange で Text Data Preparation についてさらに検索

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by

extract useful info from text file filled with irrelevant info

13 件のコメント
11 件の古いコメントを表示11 件の古いコメントを非表示

採用された回答

11 件のコメント
9 件の古いコメントを表示9 件の古いコメントを非表示

その他の回答 (1 件)

3 件のコメント
1 件の古いコメントを表示1 件の古いコメントを非表示

参考

カテゴリ

タグ

Community Treasure Hunt

extract useful info from text file filled with irrelevant info

13 件のコメント 11 件の古いコメントを表示11 件の古いコメントを非表示

採用された回答

11 件のコメント 9 件の古いコメントを表示9 件の古いコメントを非表示

その他の回答 (1 件)

3 件のコメント 1 件の古いコメントを表示1 件の古いコメントを非表示

参考

カテゴリ

タグ

Community Treasure Hunt

13 件のコメント
11 件の古いコメントを表示11 件の古いコメントを非表示

11 件のコメント
9 件の古いコメントを表示9 件の古いコメントを非表示

3 件のコメント
1 件の古いコメントを表示1 件の古いコメントを非表示