This is machine translation

Translated by Microsoft
Mouseover text to see original. Click the button below to return to the English version of the page.

Note: This page has been translated by MathWorks. Click here to see
To view all translated materials including this page, select Country from the country navigator on the bottom of this page.

Try Text Analytics in 10 Lines of Code

This example shows how to use text analytics to classify text data using only 10 lines of MATLAB® code. Try the example to see how simple it is to get started with text analytics in MATLAB.

You can create a simple classification model which uses word frequency counts as predictors. This example trains a classification model to predict the event type of weather reports using text descriptions.

Create Model

The main steps of creating a model are:

  1. Import – import the text data into MATLAB.

  2. Preprocess – preprocess the text for word analysis.

  3. Convert – convert the text to numeric data.

  4. Train – train a classification model.

Import the example text data and labels, tokenize the text, convert it to numeric data using a bag-of-words model, and train a supervised SVM classifier.

data = readtable('weatherReports.csv','TextType','String'); % Read data
labels = categorical(data.event_type);                      % Read labels

documents = tokenizedDocument(data.event_narrative);        % Preprocess text

bag = bagOfWords(documents);                                % Count words
XTrain = bag.Counts;                                        % Convert to numeric data

mdl = fitcecoc(XTrain,labels,'Learners','linear');          % Train classifier

Predict Using New Data

The steps for prediction are similar to those for training. To predict using new data, preprocess the text data and convert it to numeric using the same steps used for training. Then, predict the label using the trained model.

Predict the label for the text "A large tree is downed.".

str = "A large tree is downed.";                            % Import text
documentsNew = tokenizedDocument(str);                      % Preprocess text
XTest = encode(bag,documentsNew);                           % Convert to numeric
label = predict(mdl,XTest)                                  % Predict label
label = categorical
     Thunderstorm Wind 

For an example showing a more detailed workflow, see Create Simple Text Model for Classification.

For next steps in text analytics, you can try improving the model accuracy by preprocessing the data and visualize the text data using word clouds. For examples, see Prepare Text Data for Analysis and Visualize Text Data Using Word Clouds.

See Also

| |

Related Topics