モデル化と予測

トピックモデルと単語埋め込みを使用して予測モデルを開発する

高次元のテキストデータセットからクラスターを検出して特徴を抽出するために、LSA、LDA、単語埋め込みなどの機械学習の手法やモデルを使用できます。Text Analytics Toolbox™ で作成された特徴を他のデータソースの特徴と組み合わせることができます。これらの特徴を使用して、テキスト、数値、およびその他の種類のデータを利用する機械学習モデルを構築することができます。

関数

すべて展開する

単語と n-gram のカウント

`bagOfWords`	bag-of-words モデル
`bagOfNgrams`	bag-of-n-grams モデル
`addDocument`	bag-of-words モデルまたは bag-of-n-grams モデルに文書を追加する
`removeDocument`	Remove documents from bag-of-words or bag-of-n-grams model
`removeInfrequentWords`	bag-of-words モデルからカウント数の少ない単語を削除する
`removeInfrequentNgrams`	Remove infrequently seen n-grams from bag-of-n-grams model
`removeWords`	文書または bag-of-words モデルからの選択単語の削除
`removeNgrams`	Remove n-grams from bag-of-n-grams model
`removeEmptyDocuments`	Remove empty documents from tokenized document array, bag-of-words model, or bag-of-n-grams model
`topkwords`	Most important words in bag-of-words model or LDA topic
`topkngrams`	Most frequent n-grams
`encode`	Encode documents as matrix of word or n-gram counts
`tfidf`	単語頻度-逆文書頻度 (tf-idf) 行列
`join`	Combine multiple bag-of-words or bag-of-n-grams models

センチメント分析

`vaderSentimentScores`	Sentiment scores with VADER algorithm
`ratioSentimentScores`	Sentiment scores with ratio rule
`encode`	Tokenize and encode text for transformer neural network (R2023b 以降)
`decode`	Convert token codes to tokens (R2023b 以降)
`encodeTokens`	Convert tokens to token codes (R2023b 以降)
`subwordTokenize`	Tokenize text into subwords using BERT tokenizer (R2023b 以降)
`wordTokenize`	Tokenize text into words using tokenizer (R2023b 以降)

トランスフォーマー

`bert`	事前学習済みの BERT モデル (R2023b 以降)
`bertDocumentClassifier`	BERT 文書分類器 (R2023b 以降)
`classify`	BERT 文書分類器を使用して文書を分類する (R2023b 以降)
`bertTokenizer`	WordPiece BERT トークナイザー (R2023b 以降)
`bpeTokenizer`	Byte pair encoding tokenizer (R2024a 以降)
`encode`	Tokenize and encode text for transformer neural network (R2023b 以降)
`decode`	Convert token codes to tokens (R2023b 以降)
`encodeTokens`	Convert tokens to token codes (R2023b 以降)
`subwordTokenize`	Tokenize text into subwords using BERT tokenizer (R2023b 以降)
`trainBERTDocumentClassifier`	BERT 文書分類器の学習 (R2023b 以降)
`wordTokenize`	Tokenize text into words using tokenizer (R2023b 以降)

埋め込みと符号化

`documentEmbedding`	文書をベクトルにマッピングする文書埋め込みモデル (R2024a 以降)
`embed`	埋め込みベクトルへの文書のマッピング (R2024a 以降)
`fastTextWordEmbedding`	事前学習済みの fastText 単語埋め込み
`wordEncoding`	単語からインデックスへのマッピングとその逆変換のための単語符号化モデル
`doc2sequence`	深層学習に向けた文書からシーケンスへの変換
`wordEmbeddingLayer`	深層学習ニューラルネットワーク用の単語埋め込み層
`word2vec`	埋め込みベクトルへの単語のマッピング
`word2ind`	符号化インデックスに対する単語のマッピング
`vec2word`	単語への埋め込みベクトルのマッピング
`ind2word`	Map encoding index to word
`isVocabularyWord`	Test if word is member of word embedding or encoding
`readWordEmbedding`	ファイルからの単語埋め込みの読み取り
`trainWordEmbedding`	単語埋め込みの学習
`writeWordEmbedding`	単語埋め込みファイルの書き込み
`wordEmbedding`	ベクトルへの単語のマッピングとその逆変換のための単語埋め込みモデル

文書の要約と類似度

`extractSummary`	Extract summary from documents
`rakeKeywords`	Extract keywords using RAKE
`textrankKeywords`	Extract keywords using TextRank
`bleuEvaluationScore`	Evaluate translation or summarization with BLEU similarity score
`rougeEvaluationScore`	Evaluate translation or summarization with ROUGE similarity score
`bm25Similarity`	Document similarities with BM25 algorithm
`cosineSimilarity`	コサイン類似度を使用した文書の類似度
`textrankScores`	TextRank アルゴリズムによる文書スコアリング
`lexrankScores`	Document scoring with LexRank algorithm
`mmrScores`	Document scoring with Maximal Marginal Relevance (MMR) algorithm

トピックモデリングと次元削減

`fitlda`	Fit latent Dirichlet allocation (LDA) model
`fitlsa`	Fit LSA model
`resume`	Resume fitting LDA model
`logp`	Document log-probabilities and goodness of fit of LDA model
`predict`	Predict top LDA topics of documents
`transform`	Transform documents into lower-dimensional space
`ldaModel`	潜在的ディリクレ配分 (LDA) モデル
`lsaModel`	Latent semantic analysis (LSA) model

固有表現認識

`addEntityDetails`	Add entity tags to documents
`trainHMMEntityModel`	Train HMM-based model for named entity recognition (NER) (R2023a 以降)
`predict`	Predict entities using named entity recognition (NER) model (R2023a 以降)
`hmmEntityModel`	HMM-based model for named entity recognition (NER) (R2023a 以降)

可視化

`wordcloud`	Create word cloud chart from text, bag-of-words model, bag-of-n-grams model, or LDA model
`textscatter`	テキストの 2 次元散布図
`textscatter3`	テキストの 3 次元散布図

トピック

分類とモデリング

単純な前処理関数の作成
この例では、[テキストデータの前処理] ライブエディタータスクを使用して、解析のためにテキストデータをクリーニングおよび前処理する関数を作成する方法を示します。
分類用の単純なテキストモデルの作成
この例では、bag-of-words モデルを使用して、単語の頻度カウントを単純なテキスト分類器に学習させる方法を示します。
文書埋め込みを使用した文書の分類
この例では、文書埋め込みを使用して文書を特徴ベクトルに変換することにより、文書分類器に学習させる方法を説明します。
マルチワードフレーズを使用したテキストデータの解析
この例では、n-gram 頻度カウントを使用してテキストを解析する方法を示します。
トピックモデルを使用したテキストデータの解析
この例では、潜在的ディリクレ配分 (LDA) トピックモデルを使用してテキストデータを解析する方法を示します。
LDA モデルのトピック数の選択
この例では、潜在的ディリクレ配分 (LDA) モデルの適切なトピック数を決定する方法を示します。
Compare LDA Solvers
This example shows how to compare latent Dirichlet allocation (LDA) solvers by comparing the goodness of fit and the time taken to fit the model.
LDA モデルを使用した文書クラスターの可視化
この例では、潜在的ディリクレ配分 (LDA) トピックモデルと t-SNE プロットを使用して文書のクラスタリングを可視化する方法を示します。
LDA トピックの相関の可視化
この例では、潜在的ディリクレ配分 (LDA) トピックモデルでトピック間の相関を解析する方法を示します。
Visualize Correlations Between LDA Topics and Document Labels
This example shows how to fit a Latent Dirichlet Allocation (LDA) topic model and visualize correlations between the LDA topics and document labels.
Train Custom Named Entity Recognition Model
This example shows how to train a custom named entity recognition (NER) model.
共起ネットワークの作成
この例では、bag-of-words モデルを使用して共起ネットワークを作成する方法を示します。
Information Retrieval with Document Embeddings
Learn about different types of document embeddings and how to use them for information retrieval. (R2024b 以降)
作業指示データを使用した情報検索
この例では、情報検索技術を使用し、過去に実行されたアクションと作業指示書の説明に基づいて新しい作業指示書の解決策を検索する方法を説明します。 (R2023b 以降)
BERT 文書分類器の学習
この例では、文書分類用の BERT ニューラルネットワークに学習させる方法を説明します。 (R2023b 以降)

センチメント分析とキーワード抽出

MATLAB でのセンチメント分析
センチメント分析の手法について学習する。 (R2023b 以降)
テキスト内のセンチメントの分析
この例では、センチメント分析のために Valence Aware Dictionary and sEntiment Reasoner (VADER) アルゴリズムを使用する方法を示します。
Generate Domain Specific Sentiment Lexicon
This example shows how to generate a lexicon for sentiment analysis using 10-K and 10-Q financial reports.
センチメント分類器の学習
この例では、ポジティブセンチメントやネガティブセンチメントを含む単語のアノテーション付きリストと事前学習済みの単語埋め込みを使用して、センチメント分析用の分類器に学習させる方法を示します。
Extract Keywords from Text Data Using RAKE
This example shows how to extract keywords from text data using Rapid Automatic Keyword Extraction (RAKE).
Extract Keywords from Text Data Using TextRank
This example shows to extract keywords from text data using TextRank.

深層学習

深層学習を使用したテキストデータの分類
この例では、深層学習長短期記憶 (LSTM) ネットワークを使用してテキストデータを分類する方法を説明します。
畳み込みニューラルネットワークを使用したテキストデータの分類
この例では、畳み込みニューラルネットワークを使用してテキストデータを分類する方法を説明します。
深層学習を使用したメモリ外のテキストデータの分類
この例では、変換されたデータストアを使用して深層学習ネットワークでメモリ外のテキストデータを分類する方法を説明します。
Attention を使用した sequence-to-sequence 変換
この例では、attention を使用した再帰型 sequence-to-sequence 符号化器-復号化器モデルを用いて数字の文字列をローマ数字に変換する方法を説明します。
深層学習を使用した複数ラベルをもつテキストの分類
この例では、複数の独立したラベルをもつテキストデータを分類する方法を説明します。
深層学習を使用したテキストの生成 (Deep Learning Toolbox)
この例では、深層学習長短期記憶 (LSTM) ネットワークに学習させてテキストを生成する方法を説明します。
『Pride and Prejudice』と MATLAB
この例では、深層学習 LSTM ネットワークに学習させ、文字の埋め込みを使用してテキストを生成する方法を説明します。
深層学習を使用した単語単位のテキスト生成
この例では、深層学習 LSTM ネットワークに学習させ、単語単位でテキストを生成する方法を説明します。
カスタム学習ループを使用したテキストデータの分類
この例では、カスタム学習ループのある深層学習の双方向長短期記憶 (BiLSTM) ネットワークを使用してテキストデータを分類する方法を説明します。
自己符号化器を使用したテキストの生成
この例では、自己符号化器を使用してテキストデータを生成する方法を示します。
テキスト符号化器モデル関数の定義
この例では、テキスト符号化器モデル関数の定義方法を示します。
テキスト復号化器モデル関数の定義
この例では、テキスト復号化器モデル関数の定義方法を示します。
Language Translation Using Deep Learning
This example shows how to train a German to English language translator using a recurrent sequence-to-sequence encoder-decoder model with attention.
BERT を使用した文書からの回答の抽出
この例では、抽出型質問応答用に事前学習済みの BERT モデルを変更および微調整する方法を説明します。 (R2024b 以降)
Out-of-Distribution Detection for BERT Document Classifier
Detect out-of-distribution (OOD) data in a BERT document classifier. (R2024b 以降)
Out-of-Distribution Detection for LSTM Document Classifier
Detect out-of-distribution (OOD) data in an LSTM document classifier. (R2024a 以降)

言語サポート

言語に関する考慮事項
他の言語向けの、Text Analytics Toolbox の機能の使用に関する情報。
日本語言語サポート
Text Analytics Toolbox での日本語サポートに関する情報。
日本語のテキストデータの解析
この例では、トピックモデルを使用して、日本語のテキストデータをインポート、準備、および解析する方法を示します。
ドイツ語サポート
Text Analytics Toolbox におけるドイツ語サポートに関する情報。
Analyze German Text Data
This example shows how to import, prepare, and analyze German text data using a topic model.