removeInfrequentWords

bag-of-words モデルからカウント数の少ない単語を削除する

構文

newBag = removeInfrequentWords(bag,count)

newBag = removeInfrequentWords(bag,count,'IgnoreCase',true)

説明

newBag = removeInfrequentWords(bag,count) は、出現回数が合計 count 回以下の単語を bag-of-words モデル bag から削除します。既定では、関数は大文字と小文字を区別します。

例

newBag = removeInfrequentWords(bag,count,'IgnoreCase',true) は、大文字小文字の区別なしで、出現回数が合計 count 回以下の単語を削除します。単語の違いが大文字と小文字だけである場合、対応するカウントが 1 つにまとめられます。

例

すべて折りたたむ

使用頻度の低い単語の削除

ライブスクリプトを開く

bag-of-words モデルから出現回数が 2 回以下の単語を削除します。

トークン化された文書の配列から bag-of-words モデルを作成します。

documents = tokenizedDocument([
    "an example of a short sentence"
    "a second short sentence"
    "another example"
    "a short example"]);
bag = bagOfWords(documents)

bag = 
  bagOfWords with properties:

          Counts: [4x8 double]
      Vocabulary: ["an"    "example"    "of"    "a"    "short"    "sentence"    "second"    "another"]
        NumWords: 8
    NumDocuments: 4

bag-of-words モデルから出現回数が 2 回以下の単語を削除します。

count = 2;
newBag = removeInfrequentWords(bag,count)

newBag = 
  bagOfWords with properties:

          Counts: [4x3 double]
      Vocabulary: ["example"    "a"    "short"]
        NumWords: 3
    NumDocuments: 4