This is machine translation

Translated by Microsoft
Mouseover text to see original. Click the button below to return to the English version of the page.

Note: This page has been translated by MathWorks. Click here to see
To view all translated materials including this page, select Country from the country navigator on the bottom of this page.

addLemmaDetails

Add lemma forms of tokens to documents

Use addLemmaDetails to add lemma forms to documents.

The function supports English and Japanese text.

Syntax

updatedDocuments = addLemmaDetails(documents)

Description

example

updatedDocuments = addLemmaDetails(documents) adds lemma details to documents and updates the token details. To get the lemma details from updatedDocuments, use tokenDetails.

Tip

Use addLemmaDetails before using the lower, upper, and normalizeWords functions as addLemmaDetails uses information that is removed by these functions.

Examples

collapse all

Create a tokenized document array.

str = [ ...
    "The dogs ran after the cat."
    "I am building a house."];
documents = tokenizedDocument(str);

Add lemma details to the documents using addLemmaDetails. This function lemmatizes the text and adds the lemma form of each token to the table returned by tokenDetails. View the updated token details of the first few tokens.

documents = addLemmaDetails(documents);
tdetails = tokenDetails(documents);
head(tdetails)
ans=8×6 table
     Token     DocumentNumber    LineNumber       Type        Language     Lemma 
    _______    ______________    __________    ___________    ________    _______

    "The"            1               1         letters           en       "the"  
    "dogs"           1               1         letters           en       "dog"  
    "ran"            1               1         letters           en       "run"  
    "after"          1               1         letters           en       "after"
    "the"            1               1         letters           en       "the"  
    "cat"            1               1         letters           en       "cat"  
    "."              1               1         punctuation       en       "."    
    "I"              2               1         letters           en       "i"    

Input Arguments

collapse all

Input documents, specified as a tokenizedDocument array.

Output Arguments

collapse all

Updated documents, returned as a tokenizedDocument array. To get the token details from updatedDocuments, use tokenDetails.

Introduced in R2018b