モデル化と予測

トピックモデルと単語埋め込みを使用して予測モデルを開発する

高次元のテキストデータセットからクラスターを検出して特徴を抽出するために、LSA、LDA、単語埋め込みなどの機械学習の手法やモデルを使用できます。Text Analytics Toolbox™ で作成された特徴を他のデータソースの特徴と組み合わせることができます。これらの特徴を使用して、テキスト、数値、およびその他の種類のデータを利用する機械学習モデルを構築することができます。

関数

すべて展開する

単語と n-gram のカウント

`bagOfWords`	bag-of-words モデル
`bagOfNgrams`	bag-of-n-grams モデル
`addDocument`	Add documents to bag-of-words or bag-of-n-grams model
`removeDocument`	Remove documents from bag-of-words or bag-of-n-grams model
`removeInfrequentWords`	bag-of-words モデルからカウント数の少ない単語を削除する
`removeInfrequentNgrams`	Remove infrequently seen n-grams from bag-of-n-grams model
`removeWords`	文書または bag-of-words モデルからの選択単語の削除
`removeNgrams`	Remove n-grams from bag-of-n-grams model
`removeEmptyDocuments`	Remove empty documents from tokenized document array, bag-of-words model, or bag-of-n-grams model
`topkwords`	Most important words in bag-of-words model or LDA topic
`topkngrams`	Most frequent n-grams
`encode`	Encode documents as matrix of word or n-gram counts
`tfidf`	単語頻度-逆文書頻度 (tf-idf) 行列
`join`	Combine multiple bag-of-words or bag-of-n-grams models

センチメント分析

`vaderSentimentScores`	Sentiment scores with VADER algorithm (R2019b 以降)
`ratioSentimentScores`	Sentiment scores with ratio rule (R2019b 以降)
`encode`	Tokenize and encode text for transformer neural network (R2023b 以降)
`decode`	Convert token codes to tokens (R2023b 以降)
`encodeTokens`	Convert tokens to token codes (R2023b 以降)
`subwordTokenize`	Tokenize text into subwords using BERT tokenizer (R2023b 以降)
`wordTokenize`	Tokenize text into words using tokenizer (R2023b 以降)

トランスフォーマー

`bert`	Pretrained BERT model (R2023b 以降)
`bertDocumentClassifier`	BERT document classifier (R2023b 以降)
`classify`	Classify document using BERT document classifier (R2023b 以降)
`bertTokenizer`	WordPiece BERT tokenizer (R2023b 以降)
`bpeTokenizer`	Byte pair encoding tokenizer (R2024a 以降)
`encode`	Tokenize and encode text for transformer neural network (R2023b 以降)
`decode`	Convert token codes to tokens (R2023b 以降)
`encodeTokens`	Convert tokens to token codes (R2023b 以降)
`subwordTokenize`	Tokenize text into subwords using BERT tokenizer (R2023b 以降)
`trainBERTDocumentClassifier`	Train BERT document classifier (R2023b 以降)
`wordTokenize`	Tokenize text into words using tokenizer (R2023b 以降)

埋め込みと符号化

`documentEmbedding`	Document embedding model to map documents to vectors (R2024a 以降)
`embed`	Map document to embedding vector (R2024a 以降)
`fastTextWordEmbedding`	事前学習済みの fastText 単語埋め込み
`wordEncoding`	Word encoding model to map words to indices and back
`doc2sequence`	Convert documents to sequences for deep learning
`wordEmbeddingLayer`	深層学習ニューラルネットワーク用の単語埋め込み層
`word2vec`	埋め込みベクトルへの単語のマッピング
`word2ind`	Map word to encoding index
`vec2word`	Map embedding vector to word
`ind2word`	Map encoding index to word
`isVocabularyWord`	Test if word is member of word embedding or encoding
`readWordEmbedding`	ファイルからの単語埋め込みの読み取り
`trainWordEmbedding`	Train word embedding
`writeWordEmbedding`	単語埋め込みファイルの書き込み
`wordEmbedding`	ベクトルへの単語のマッピングとその逆変換のための単語埋め込みモデル

文書の要約と類似度

`extractSummary`	Extract summary from documents (R2020a 以降)
`rakeKeywords`	Extract keywords using RAKE (R2020b 以降)
`textrankKeywords`	Extract keywords using TextRank (R2020b 以降)
`bleuEvaluationScore`	Evaluate translation or summarization with BLEU similarity score (R2020a 以降)
`rougeEvaluationScore`	Evaluate translation or summarization with ROUGE similarity score (R2020a 以降)
`bm25Similarity`	Document similarities with BM25 algorithm (R2020a 以降)
`cosineSimilarity`	コサイン類似度を使用した文書の類似度 (R2020a 以降)
`textrankScores`	Document scoring with TextRank algorithm (R2020a 以降)
`lexrankScores`	Document scoring with LexRank algorithm (R2020a 以降)
`mmrScores`	Document scoring with Maximal Marginal Relevance (MMR) algorithm (R2020a 以降)

トピックモデリングと次元削減

`fitlda`	Fit latent Dirichlet allocation (LDA) model
`fitlsa`	Fit LSA model
`resume`	Resume fitting LDA model
`logp`	Document log-probabilities and goodness of fit of LDA model
`predict`	Predict top LDA topics of documents
`transform`	Transform documents into lower-dimensional space
`ldaModel`	潜在的ディリクレ配分 (LDA) モデル
`lsaModel`	Latent semantic analysis (LSA) model

固有表現認識

`addEntityDetails`	Add entity tags to documents
`trainHMMEntityModel`	Train HMM-based model for named entity recognition (NER) (R2023a 以降)
`predict`	Predict entities using named entity recognition (NER) model (R2023a 以降)
`hmmEntityModel`	HMM-based model for named entity recognition (NER) (R2023a 以降)

可視化

`wordcloud`	Create word cloud chart from text, bag-of-words model, bag-of-n-grams model, or LDA model
`textscatter`	テキストの 2 次元散布図
`textscatter3`	3-D scatter plot of text

トピック

分類とモデリング

単純な前処理関数の作成
この例では、[テキストデータの前処理] ライブエディタータスクを使用して、解析のためにテキストデータをクリーニングおよび前処理する関数を作成する方法を示します。
分類用の単純なテキストモデルの作成
この例では、bag-of-words モデルを使用して、単語の頻度カウントを単純なテキスト分類器に学習させる方法を示します。
Classify Documents Using Document Embeddings
This example shows how to train a document classifier by converting documents to feature vectors using a document embedding.
マルチワードフレーズを使用したテキストデータの解析
この例では、n-gram 頻度カウントを使用してテキストを解析する方法を示します。
トピックモデルを使用したテキストデータの解析
この例では、潜在的ディリクレ配分 (LDA) トピックモデルを使用してテキストデータを解析する方法を示します。
LDA モデルのトピック数の選択
この例では、潜在的ディリクレ配分 (LDA) モデルの適切なトピック数を決定する方法を示します。
Compare LDA Solvers
This example shows how to compare latent Dirichlet allocation (LDA) solvers by comparing the goodness of fit and the time taken to fit the model.
LDA モデルを使用した文書クラスターの可視化
この例では、潜在的ディリクレ配分 (LDA) トピックモデルと t-SNE プロットを使用して文書のクラスタリングを可視化する方法を示します。
LDA トピックの相関の可視化
この例では、潜在的ディリクレ配分 (LDA) トピックモデルでトピック間の相関を解析する方法を示します。
Visualize Correlations Between LDA Topics and Document Labels
This example shows how to fit a Latent Dirichlet Allocation (LDA) topic model and visualize correlations between the LDA topics and document labels.
Train Custom Named Entity Recognition Model
This example shows how to train a custom named entity recognition (NER) model.
共起ネットワークの作成
この例では、bag-of-words モデルを使用して共起ネットワークを作成する方法を示します。
Information Retrieval with Work Orders Data
This example shows how to use information retrieval techniques to find solutions for new work orders based on past actions taken and descriptions from work orders. It shows how you can leverage the text descriptions of past incidents and the actions taken to suggest possible solutions for new problems. (R2023b 以降)
Train BERT Document Classifier
This example shows how to train a BERT neural network for document classification. (R2023b 以降)

センチメント分析とキーワード抽出

Sentiment Analysis in MATLAB
Learn about sentiment analysis techniques. (R2023b 以降)
テキスト内のセンチメントの分析
この例では、センチメント分析のために Valence Aware Dictionary and sEntiment Reasoner (VADER) アルゴリズムを使用する方法を示します。
Generate Domain Specific Sentiment Lexicon
This example shows how to generate a lexicon for sentiment analysis using 10-K and 10-Q financial reports.
センチメント分類器の学習
この例では、ポジティブセンチメントやネガティブセンチメントを含む単語のアノテーション付きリストと事前学習済みの単語埋め込みを使用して、センチメント分析用の分類器に学習させる方法を示します。
Extract Keywords from Text Data Using RAKE
This example shows how to extract keywords from text data using Rapid Automatic Keyword Extraction (RAKE).
Extract Keywords from Text Data Using TextRank
This example shows to extract keywords from text data using TextRank.

深層学習

深層学習を使用したテキストデータの分類
この例では、深層学習の長短期記憶 (LSTM) ネットワークを使用してテキストデータを分類する方法を示します。
Classify Text Data Using Convolutional Neural Network
This example shows how to classify text data using a convolutional neural network.
Classify Out-of-Memory Text Data Using Deep Learning
This example shows how to classify out-of-memory text data with a deep learning network using a transformed datastore.
Sequence-to-Sequence Translation Using Attention
This example shows how to convert decimal strings to Roman numerals using a recurrent sequence-to-sequence encoder-decoder model with attention.
Multilabel Text Classification Using Deep Learning
This example shows how to classify text data that has multiple independent labels.
深層学習を使用したテキストの生成 (Deep Learning Toolbox)
この例では、深層学習長短期記憶 (LSTM) ネットワークに学習させてテキストを生成する方法を説明します。
Pride and Prejudice and MATLAB
This example shows how to train a deep learning LSTM network to generate text using character embeddings.
深層学習を使用した単語単位のテキスト生成
この例では、深層学習 LSTM ネットワークに学習させ、単語単位でテキストを生成する方法を説明します。
Classify Text Data Using Custom Training Loop
This example shows how to classify text data using a deep learning bidirectional long short-term memory (BiLSTM) network with a custom training loop.
Generate Text Using Autoencoders
This example shows how to generate text data using autoencoders.
Define Text Encoder Model Function
This example shows how to define a text encoder model function.
Define Text Decoder Model Function
This example shows how to define a text decoder model function.
Language Translation Using Deep Learning
This example shows how to train a German to English language translator using a recurrent sequence-to-sequence encoder-decoder model with attention.

言語サポート

言語に関する考慮事項
他の言語向けの、Text Analytics Toolbox の機能の使用に関する情報。
日本語言語サポート
Text Analytics Toolbox での日本語サポートに関する情報。
日本語のテキストデータの解析
この例では、トピックモデルを使用して、日本語のテキストデータをインポート、準備、および解析する方法を示します。
German Language Support
Information on German support in Text Analytics Toolbox.
Analyze German Text Data
This example shows how to import, prepare, and analyze German text data using a topic model.