Try Text Analytics in 10 Lines of Code
This example shows how to use text analytics to classify text data using only 10 lines of MATLAB® code. Try the example to see how simple it is to get started with text analytics in MATLAB.
You can create a simple classification model which uses word frequency counts as predictors. This example trains a classification model to predict the event type of factory reports using text descriptions.
Create Model
The main steps of creating a model are:
Import – import the text data into MATLAB.
Preprocess – preprocess the text for word analysis.
Convert – convert the text to numeric data.
Train – train a classification model.
Import the example text data and labels, tokenize the text, convert it to numeric data using a bag-of-words model, and train a supervised SVM classifier.
data = readtable('factoryReports.csv','TextType','String'); % Read data labels = categorical(data.Category); % Read labels documents = tokenizedDocument(data.Description); % Preprocess text bag = bagOfWords(documents); % Count words XTrain = bag.Counts; % Convert to numeric data mdl = fitcecoc(XTrain,labels,'Learners','linear'); % Train classifier
Predict Using New Data
The steps for prediction are similar to those for training. To predict using new data, preprocess the text data and convert it to numeric using the same steps used for training. Then, predict the label using the trained model.
Predict the label for the text "Coolant is pooling underneath sorter."
.
str = "Coolant is pooling underneath sorter."; % Import text documentsNew = tokenizedDocument(str); % Preprocess text XTest = encode(bag,documentsNew); % Convert to numeric label = predict(mdl,XTest) % Predict label
label = categorical
Leak
For an example showing a more detailed workflow, see Create Simple Text Model for Classification.
For next steps in text analytics, you can try improving the model accuracy by preprocessing the data and visualize the text data using word clouds. For examples, see Prepare Text Data for Analysis and Visualize Text Data Using Word Clouds.
See Also
tokenizedDocument
| bagOfWords
| encode