textrankScores

TextRank アルゴリズムによる文書スコアリング

構文

scores = textrankScores(documents)

scores = textrankScores(bag)

説明

scores = textrankScores(documents) は、TextRank アルゴリズムを使用し、ペア単位の類似度値に基づいて documents の重要度をスコアリングします。類似度スコアと重要度スコアを計算するために、関数はそれぞれ BM25 アルゴリズムと PageRank アルゴリズムを使用します。

例

scores = textrankScores(bag) は、bag-of-words または bag-of-n-grams モデル bag によって符号化された文書にスコアを付けます。

例

すべて折りたたむ

文書の重要度

ライブスクリプトを開く

トークン化された文書の配列を作成します。

str = [
    "the quick brown fox jumped over the lazy dog"
    "the fast brown fox jumped over the lazy dog"
    "the lazy dog sat there and did nothing"
    "the other animals sat there watching"];
documents = tokenizedDocument(str)

documents = 
  4×1 tokenizedDocument:

    9 tokens: the quick brown fox jumped over the lazy dog
    9 tokens: the fast brown fox jumped over the lazy dog
    8 tokens: the lazy dog sat there and did nothing
    6 tokens: the other animals sat there watching

TextRank スコアを計算します。

scores = textrankScores(documents);

スコアを棒グラフで可視化します。

figure
bar(scores)
xlabel("Document")
ylabel("Score")
title("TextRank Scores")

Figure contains an axes object. The axes object with title TextRank Scores, xlabel Document, ylabel Score contains an object of type bar.

bag-of-words モデルを使用したスコア

ライブスクリプトを開く

sonnets.csv のテキストデータから bag-of-words モデルを作成します。

filename = "sonnets.csv";
tbl = readtable(filename,'TextType','string');
textData = tbl.Sonnet;
documents = tokenizedDocument(textData);
bag = bagOfWords(documents)

bag = 
  bagOfWords with properties:

        NumWords: 3527
          Counts: [154×3527 double]
      Vocabulary: ["From"    "fairest"    "creatures"    "we"    "desire"    "increase"    ","    "That"    "thereby"    "beauty's"    "rose"    "might"    "never"    "die"    "But"    "as"    "the"    "riper"    "should"    "by"    …    ] (1×3527 string)
    NumDocuments: 154

TextRank スコアを計算します。

scores = textrankScores(bag);

スコアを棒グラフで可視化します。

figure
bar(scores)
xlabel("Document")
ylabel("Score")
title("TextRank Scores")

Figure contains an axes object. The axes object with title TextRank Scores, xlabel Document, ylabel Score contains an object of type bar.

入力引数

すべて折りたたむ

`documents` — 入力文書
`tokenizedDocument` 配列 | string 配列 | 文字ベクトルの cell 配列

入力文書。tokenizedDocument 配列、単語の string 配列、または文字ベクトルの cell 配列として指定します。documents は、tokenizedDocument 配列でない場合、各要素が単語である単一の文書を表す行ベクトルでなければなりません。複数の文書を指定するには、tokenizedDocument 配列を使用します。

`bag` — 入力モデル
`bagOfWords` オブジェクト | `bagOfNgrams` オブジェクト

入力の bag-of-words モデルまたは bag-of-n-grams モデル。bagOfWords オブジェクトまたは bagOfNgrams オブジェクトとして指定します。bag が bagOfNgrams オブジェクトの場合、関数は各 n-gram を 1 つの単語として扱います。

出力引数

すべて折りたたむ

`scores` — TextRank スコア
ベクトル

TextRank スコア。N 行 1 列のベクトルとして返されます。ここで、scores(i) は i 番目の入力文書のスコアに対応し、N は入力文書の数です。

参照

[1] Mihalcea, Rada, and Paul Tarau. "TextRank: Bringing Order into Text." In Proceedings of the 2004 conference on empirical methods in natural language processing, pp. 404-411. 2004.

バージョン履歴

R2020a で導入

参考

トピック

Attention を使用した sequence-to-sequence 変換

textrankScores

構文

説明

例

文書の重要度

bag-of-words モデルを使用したスコア

入力引数

documents — 入力文書 tokenizedDocument 配列 | string 配列 | 文字ベクトルの cell 配列

bag — 入力モデル bagOfWords オブジェクト | bagOfNgrams オブジェクト

出力引数

scores — TextRank スコア ベクトル

参照

バージョン履歴

参考

トピック

`documents` — 入力文書
`tokenizedDocument` 配列 | string 配列 | 文字ベクトルの cell 配列

`bag` — 入力モデル
`bagOfWords` オブジェクト | `bagOfNgrams` オブジェクト

`scores` — TextRank スコア
ベクトル