how to display a line from a text file that satisfies certain set of words

4 ビュー (過去 30 日間)
Ayushi Saxena
Ayushi Saxena 2015 年 12 月 30 日
コメント済み: Ayushi Saxena 2016 年 1 月 4 日
hi... if i have a text file say " twinkle twinkle little start; how i wonder what you are ; up above the word so high ; like a diamond in the sky;" so what i want to do is to check for a particular word like wonder & if i have that word then it should look for other word say high as well like a supporting word.. & if it has both the word then it should display that line... basically i am trying to do extractive summarization kind of stuff using matlab... can anyone help me in creating such thing

回答 (2 件)

Walter Roberson
Walter Roberson 2015 年 12 月 31 日
Although strfind can do some things, it has problems with overlapping words and with words being in substrings rather than in full words. For example, "if" would be found in "lover's tiff". To avoid this, it is easiest to use regexp() and word boundary markers
if regexp(sentence, '(\<star\>.*\<twinkle\>)|(\<twinkle\>.*\<star\>)') ...
To automate this further:
pattern = sprintf('(\\<%s\\>.*\\<%s\\>)|(\\<%s\\>.*\\<%s\\>)', word1, word2, word2, word1);
if regexp(sentence, pattern) ...
Caution: here, the '.*' will match any character, including newline and including punctuation. If your sentence variable is not already broken up into distinct English sentences then this code will match across multiple grammatical sentences.
The pattern to restrict to a single grammatical sentence is not easy, because grammatical sentences boundaries are tricky to detect. Grammatical sentences can end in period or exclamation mark or question mark, but none of these necessarily ends the grammatical sentence, "... especially if there are quotations in the grammatical sentence!", or if there are parenthetical comments (don't you think those are important?) in some portion. Periods are a nuisance: they can signal the end of the sentence or they can signal an abbr., or they can signal a decimal point. A period that occurs after a value proceeded by a currency unit is sometimes a decimal point that will cost you $10. per hair that you pull out trying to get the code to work. Sometimes apostrophes after whitespace signal quotations and sometimes 'tis not a quotation at all and apostrophes words might be signalling the words' possessiveness.
Your code to reliably break your input into sentences is going to be much longer than your code to find multiple words within the resulting sentences.
  7 件のコメント
Ayushi Saxena
Ayushi Saxena 2016 年 1 月 4 日
let me clear it.... its like I need to read certain file basically a pdf file or a word document & on the basis of that I need to conclude some of the things... so its like if under certain headings if I have a sentence consisting of say words like "x" "y" "z" then applying condition that any meaningful sentence that has word "x" & "z" then I can conclude it as 'A'.
Ayushi Saxena
Ayushi Saxena 2016 年 1 月 4 日
its like decisions making on the basis of extracted data from the word or pdf file...

サインインしてコメントする。


Image Analyst
Image Analyst 2015 年 12 月 30 日
編集済み: Image Analyst 2015 年 12 月 30 日
Look up "strfind" and "if" in the help. I'm assuming you already know how to import the words from the file. If not, look into fread(), importdata(), textscan(), textread(), etc.
You might also like to use John D'Errico's allwords to split your sentence up into words.
  6 件のコメント
Walter Roberson
Walter Roberson 2015 年 12 月 31 日
I think you mean
if ~isempty(strfind(sentence, word1)) && ~isempty(strfind(sentence, word2))
% Then both words occur in the sentence string.
end
Image Analyst
Image Analyst 2015 年 12 月 31 日
Right - thanks for correcting.

サインインしてコメントする。

カテゴリ

Help Center および File ExchangeCharacters and Strings についてさらに検索

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by