how to find most common words in text by matlab

11 ビュー (過去 30 日間)
bita hallajian
bita hallajian 2017 年 10 月 28 日
コメント済み: Charmaine Tan 2018 年 11 月 26 日
how to tag POS on nouns and verbs in MATLAB, Is it related to regular expressions? I know that regular expressions find a pattern in a text, but I want to find the most common words in texts and tag POS on them( I mean the words are nouns or verbs) and then exchange that POS and make an unfamiliar pair of words. how can I find the most common words in texts by MATLAB?is there any solution for that or I should use another software?

採用された回答

Christopher Creutzig
Christopher Creutzig 2017 年 11 月 2 日
編集済み: Christopher Creutzig 2018 年 11 月 26 日
Finding the most common words is easy with Text Analytics Toolbox:
>> sonnets = extractFileText("sonnets.txt");
>> sonnets = erasePunctuation(sonnets);
>> tokenizedSonnets = tokenizedDocument(lower(sonnets));
>> bag = bagOfWords(tokenizedSonnets);
>> topkwords(bag, 10)
ans =
10×2 table
Word Count
______ _____
"and" 490
"the" 436
"to" 409
"my" 371
"of" 370
"i" 344
"in" 321
"that" 320
"thy" 281
"thou" 234
You probably want to remove some words (check out removeWords and stopWords). POS tagging is supported in release R2018b and later, see addPartOfSpeechDetails.
  2 件のコメント
bita hallajian
bita hallajian 2017 年 12 月 2 日
編集済み: bita hallajian 2017 年 12 月 2 日
With great thanks for your help I'll try the helpful points you directed me I created a text document with the name "sonnets.txt" and loaded it in command window but there is an error : "Number of columns on line 2 of ASCII file sonnets.txt must be the same as previous lines." Can you give me useful advice to eliminate this error?
Christopher Creutzig
Christopher Creutzig 2018 年 5 月 2 日
What command(s) did you try to read that file? The error message looks like you tried to read it as a table; try using the commands listed above instead.

サインインしてコメントする。

その他の回答 (2 件)

Sarah Palfreyman
Sarah Palfreyman 2018 年 4 月 30 日
編集済み: Sarah Palfreyman 2018 年 4 月 30 日
  2 件のコメント
IORUNDU GABRIEL
IORUNDU GABRIEL 2018 年 5 月 16 日
Which version of matlab is the least that supports the Text analytic toolbox?
Rik
Rik 2018 年 5 月 16 日
R2017b

サインインしてコメントする。


Charmaine Tan
Charmaine Tan 2018 年 11 月 26 日
Hi, after finding my topkwords (most frequent words), how can I plot a histogram of these?
  2 件のコメント
Christopher Creutzig
Christopher Creutzig 2018 年 11 月 26 日
txt = extractFileText('sonnets.txt');
td = tokenizedDocument(lower(txt));
td = erasePunctuation(td);
bow = bagOfWords(td);
top = topkwords(bow,20);
bar(top.Count)
set(gca,'XTick',1:size(top,1),'XTickLabel',top.Word,'XTickLabelRotation',45)
Screen Shot 2018-11-26 at 09.23.48.png
(In general, it's a good idea not to ask a new question as an “answer,” but to open a new question instead. It helps other people searching MATLAB Answers in the future.)
Charmaine Tan
Charmaine Tan 2018 年 11 月 26 日
Noted, I'll do that. Thanks a lot!

サインインしてコメントする。

カテゴリ

Help Center および File ExchangeLanguage Support についてさらに検索

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by