Lineanchor not working in regexp

3 ビュー (過去 30 日間)
Omkar
Omkar 2024 年 10 月 7 日
編集済み: Rahul 2024 年 10 月 9 日
Hi,
I'm trying to read the content of my text file in matlab. I am using fileread to get text, but if I match lineanchor expression to get ending characters of line I don't get correct output.
str = fileread('output.txt');
expr = '.$';
lastInLine = regexp(str,expr,'match','lineanchors')
Can anyone help? thanks.
  2 件のコメント
Rahul
Rahul 2024 年 10 月 7 日
Hi @Omkar, can you provide the text document 'output.txt' mentioned in the post, that'll help in better understanding the issue.
Omkar
Omkar 2024 年 10 月 7 日
Sure, I have attached the same.

サインインしてコメントする。

採用された回答

Rahul
Rahul 2024 年 10 月 9 日
編集済み: Rahul 2024 年 10 月 9 日
Hi Omkar,
Assuming you're trying to find ending characters of each line, inside a text document, using the "regexp" function in MATLAB R2017a. Based on the text document provided, '\r\n' are present as line ending characters instead of standard Unix-style '\n' newline characters, which can be verified using the following script:
% Read the file content as a single string
fileID = fopen('input.txt', 'r');
text = fscanf(fileID, '%c');
fclose(fileID);
% Check if '\r\n' exists in the text
if contains(text, sprintf('\r\n'))
disp('The file contains Windows-style line endings (\r\n).');
elseif contains(text, sprintf('\n'))
disp('The file contains Unix-style line endings (\n).');
else
disp('No standard line endings found.');
end
The 'lineanchor’ option for 'regexp' uses Unix-style line ending '\n', as its implicit assumption. It may be helpful to note that the 'fileread' function does not convert '\r\n' into plain '\n', so if your text document uses '\r\n', then the ‘lineanchor’ option may not interpret this as the end of a line.
This specific behavior has been highlighted in documentations of ‘regexp’ in MATLAB R2018b and later releases:
A possible workaround could be to replace all '\r\n' to '\n' using 'strrep' or 'regexprep' functions, before finding line ending characters using 'regexp':
% Replace Windows newlines with Unix newlines
text_new = strrep(str, sprintf('\r\n'), newline);
% Perform lineanchor regular expression matching
expression = '.$';
lastInLine = regexp(text_new,expression,'match','lineanchors')
To know more about usage of ‘regexp’, ’regexprep’ and ‘strrep’ functions, refer to the documentation links mentioned below:
Hope that helped!

その他の回答 (1 件)

Stephen23
Stephen23 2024 年 10 月 7 日
編集済み: Stephen23 2024 年 10 月 7 日
My guess is that you have not taken the newline characters into account. Note the difference:
tx1 = sprintf('Hello\r\nWorld'); % Windows
tx2 = sprintf('Hello\nWorld'); % *nix, MacOS
ma1 = regexp(tx1,'.$','match','lineanchors');
ma1{:}
ans =
' '
ans = 'd'
ma2 = regexp(tx2,'.$','match','lineanchors');
ma2{:}
ans = 'o'
ans = 'd'
One solution is to replace the Windows newline characters with a simple linefeed:
tx3 = regexprep(tx1,'\r\n','\n');
ma3 = regexp(tx3,'.$','match','lineanchors');
ma3{:}
ans = 'o'
ans = 'd'

カテゴリ

Help Center および File ExchangeCharacters and Strings についてさらに検索

タグ

製品


リリース

R2017a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by