Reading a *.txt document and extracting specific words/phrases

1 回表示 (過去 30 日間)
Shima Asaadi
Shima Asaadi 2016 年 3 月 18 日
コメント済み: Shima Asaadi 2016 年 3 月 18 日
I have a *.txt document file and I would like to extract the words/phrases that I know the start and end character number of them in that document.
For example the word's start and end char number is : 711,724. I tried to match them using the following MATLAB code:
filetoread ='document file path';
fid = fopen(filetoread)
x=zeros(1,1);
while 1
tline = fgetl(fid);
if ~ischar(tline), break, end
x = [x , tline];
end
x(1, 711:724)
In the code I try to save the whole document in a matrix x and printing the columns between 711 and 724. But it does not match the words correctly. I think the problem is with whitespaces,empty lines,...
(I attached a sample document)
I would appreciate any help,
Many thanks

回答 (1 件)

Azzi Abdelmalek
Azzi Abdelmalek 2016 年 3 月 18 日
filetoread ='yourfile.txt';
fid = fopen(filetoread)
k=1;
v=cell(1,1)
while 1
tline = fgetl(fid);
if ~ischar(tline), break, end
v{k,1}=tline
k=k+1
end
a=cellfun(@(x) strtrim(x),v,'un',0)
a(cellfun(@isempty,a) )=[]
out=cellfun(@(x) x(10:20),a,'un',0)
  1 件のコメント
Shima Asaadi
Shima Asaadi 2016 年 3 月 18 日
Thank you very much for answer.
In this case each paragraph is considered separately, though considering empty lines. for example the word with start/end char numbers of "570,590" in the original document can not be extracted in this way. Because it is in a paragraph that starts from first to the length of the paragraph. How can I modify the code to take the whole documents at once?
Thank you for your help

サインインしてコメントする。

カテゴリ

Help Center および File ExchangeGet Started with MATLAB についてさらに検索

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by