Pull out strings and its values from a text file.

Question

0 投票

Hi

Please find the attachment *.txt file. I want to analyze the whole text file .

Thanks

-Sriram

1 件のコメント
-1 件の古いコメントを表示 -1 件の古いコメントを非表示

Walter Roberson 2019 年 5 月 25 日

This is a continuation from https://www.mathworks.com/matlabcentral/answers/463264-matlab-using-regexp-function-i-want-to-parse-string-and-data

サインインしてコメントする。

サインインしてこの質問に回答する。

Follow Question

Answer 1

Guillaume 2019 年 6 月 11 日

MATLAB Online で開く

1 投票

HI Sriram, sorry I was away last week.

Parsing the the first part of each message (date, level, source) is trivial. It's the part after that that is difficult due to the variations of format. I don't fully understand the algorithm you've written and I don't think you can use : indiscriminately as a delimiter. For example on line 2, it's part of https://www....

Here is how I would start the parsing:

filecontent = string(fileread('File.txt'));     %read whole file as STRING (for easier text comparison later)
messages = regexp(filecontent, '^(?<date>[^ ]+) (?<level>[^ ]+) (?<source>[^:]+):\s+(?<content>[^\r\n]+)', 'names', 'lineanchors');  %parse all lines according to common format
dates = num2cell(datetime([messages.date], 'InputFormat', 'yyyy-MM-dd''T''HH:mm:ss.SSSSSSZZZZZ', 'TimeZone', 'UTC'));   %decode date
[messages.date] = dates{:};  %and put back into structure
%parsing of kernel messages
iskernel = [messages.source] == "kernel";
parsedkernel = regexp([messages(iskernel).content], '\[\s*(?<cputime>[^\]]+)]\s+(?<message>.*)', 'names');  %parse kernel messages. Not sure of the rule
parsedkernel = [parsedkernel{:}];  %convert into structure array
cputime = num2cell(str2double([parsedkernel.cputime]));  %convert cputime to numeric
[parsedkernel.cputime] = cputime{:};  %and put back into structure
parsedkernel = num2cell(parsedkernel);  %convert to cell array to put back into messages structure
[messages(iskernel).content] = parsedkernel{:};

6 件のコメント
4 件の古いコメントを表示 4 件の古いコメントを非表示

Guillaume 2019 年 6 月 14 日

編集済み: Guillaume 2019 年 6 月 14 日

MATLAB Online で開く

So which version?

Same code to work with char arrays instead of strings:

filecontent = fileread('File.txt');     %read whole file as STRING (for easier text comparison later)
messages = regexp(filecontent, '^(?<date>[^ ]+) (?<level>[^ ]+) (?<source>[^:]+):\s+(?<content>[^\r\n]+)', 'names', 'lineanchors');  %parse all lines according to common format
dates = num2cell(datetime({messages.date}, 'InputFormat', 'yyyy-MM-dd''T''HH:mm:ss.SSSSSSZZZZZ', 'TimeZone', 'UTC'));   %decode date
[messages.date] = dates{:};  %and put back into structure
%parsing of kernel messages
iskernel = strcmp({messages.source}, 'kernel');
parsedkernel = regexp({messages(iskernel).content}, '\[\s*(?<cputime>[^\]]+)]\s+(?<message>.*)', 'names');  %parse kernel messages. Not sure of the rule
parsedkernel = [parsedkernel{:}];  %convert into structure array
cputime = num2cell(str2double({parsedkernel.cputime}));  %convert cputime to numeric
[parsedkernel.cputime] = cputime{:};
parsedkernel = num2cell(parsedkernel);  %convert to cell array to put back into messages structure
[messages(iskernel).content] = parsedkernel{:};

Guillaume 2019 年 6 月 14 日

Sriram's comment mistakenly posted as an answer (please use comments!):

Thanks a lot. I works.

Guillaume 2019 年 6 月 14 日

Then consider changing your accepted answer, particularly after all the hard work that has gone in getting you there.

サインインしてコメントする。

Answer 2

Dimitar Georgiev 2019 年 5 月 26 日

0 投票

cell = readcell('filename.xlsx','Range','......');

stringname = '......';

variable = strcmp(stringname,cell);

12 件のコメント
10 件の古いコメントを表示 10 件の古いコメントを非表示

Life is Wonderful 2019 年 5 月 30 日

編集済み: Life is Wonderful 2019 年 5 月 30 日

Guillaume

Understood. My requirements are example snippet from the attached file is

2019-05-10T21:41:40.054631+00:00 INFO kernel: [ 0.000000] BIOS-e820: [mem 0x0000000000000000-0x0000000000000fff] type 16

2019-05-10T21:41:40.054649+00:00 DEBUG kernel: [ 0.000009] e820: update [mem 0x00000000-0x00000fff] usable ==> reserved

2019-05-10T21:41:40.054785+00:00 NOTICE kernel: [ 0.101170] random: get_random_bytes called from start_kernel+0x8d/0x429 with crng_init=0

s_TimeStamp(idx,1) = 21:41:40.054631+00:00

MsgLib.Kernel.String(idx,1) = INFO kernel

MsgLib.Kernel.String(idx,1) = DEBUG kernel

MsgLib.Kernel.String(idx,1) = NOTICE kernel

MsgLib.Kernel.val(idx,1) = 0.000000

Msg.Kernel.BIOS.SubStr(idx,1) = BIOS-e820

Msg.Kernel.BIOS.SubVal(idx,1) = [mem 0x0000000000000000-0x0000000000000fff]

Msg.Kernel.BIOS.Substr.type(idx,1) = type

Msg.Kernel.BIOS.SubVal.type(idx,1) = 16

figure;

subplot(3,1,1);plot(MsgLib.Kernel.String,MsgLib.Kernel.val );legend(sprintf('%s,%d','MsgLib.Kernel.String',MsgLib.Kernel.val);

subplot(3,1,2);plot(Msg.Kernel.BIOS.SubStr,Msg.Kernel.BIOS.SubVal)

subplot(3,1,3);plot(Msg.Kernel.BIOS.Substr.type,Msg.Kernel.BIOS.SubVal.type );

Like this I want to the full file analysis. My request is if you help me with a sample code - I will generate rest of coding.

Thanks

Guillaume 2019 年 5 月 30 日

Ok,so you want to parse each line of the file and split the lines into various components.

Once again, you're also giving an implementation. I'm not convinced that the structure you outline is a good idea, but that's not important right now.

The first thing you have to do, before we can even think how to implement it, is define exactly the parsing rules for the lines of the file. The start of the rule is going to be:

extract all the characters up to the first space and decode that as time
then, extract the characters up to the colon. That's the log source (I assume)

After that I'm not sure. It looks like the rule may vary according to the log source. If the log source is ANYTHING kernel, then the next step is

Extrace the number between [] as the log value

Then it gets very murky, you get different types of messages after the [xxx] with different formatings. You will have to establish the rules for how these should be decoded.

If the log source is not the kernel, you get a completely different format of message. Again, you need to specify the rules for decoding these.

So, I'm afraid, the task is back onto you. You first need to define rules (there's going to be several due to the complex formatting of the lines) on how to split a line into various components. Only once you've done that can we think about writing the code to do it.

I suggest you continue this bullet point list:

For each line:

extract the text up to the first space as the logtime
then extract the text up to the colon as the logsource
if logsource ends with kernel
extract the number between the [] as logvalue
????
if logsource is ????
????
????
if ???
????
????

Guillaume 2019 年 6 月 1 日

As I wrote:

So, I'm afraid, the task is back onto you You first need to define rules (there's going to be several due to the complex formatting of the lines) on how to split a line into various components. Only once you've done that can we think about writing the code to do it

Life is Wonderful 2019 年 6 月 6 日

Hi Guillaume

Any feedback here ?

Thanks

サインインしてコメントする。

Pull out strings and its values from a text file.

1 件のコメント
-1 件の古いコメントを表示 -1 件の古いコメントを非表示

採用された回答

6 件のコメント
4 件の古いコメントを表示 4 件の古いコメントを非表示

その他の回答 (1 件)

12 件のコメント
10 件の古いコメントを表示 10 件の古いコメントを非表示

カテゴリ

タグ

Community Treasure Hunt

Pull out strings and its values from a text file.

1 件のコメント -1 件の古いコメントを表示 -1 件の古いコメントを非表示

採用された回答

6 件のコメント 4 件の古いコメントを表示 4 件の古いコメントを非表示

その他の回答 (1 件)

12 件のコメント 10 件の古いコメントを表示 10 件の古いコメントを非表示

カテゴリ

タグ

参考

Community Treasure Hunt

1 件のコメント
-1 件の古いコメントを表示 -1 件の古いコメントを非表示

6 件のコメント
4 件の古いコメントを表示 4 件の古いコメントを非表示

12 件のコメント
10 件の古いコメントを表示 10 件の古いコメントを非表示