newBag = removeNgrams(bag,idx)
specifies n-grams by numeric or logical indices in bag.Ngrams.
This syntax is the same as newBag =
removeNgrams(bag,bag.Ngrams(idx,:)).
Load the example data. The file sonnetsPreprocessed.txt contains preprocessed versions of Shakespeare's sonnets. The file contains one sonnet per line, with words separated by a space. Extract the text from sonnetsPreprocessed.txt, split the text into documents at newline characters, and then tokenize the documents.
Load the example data. The file sonnetsPreprocessed.txt contains preprocessed versions of Shakespeare's sonnets. The file contains one sonnet per line, with words separated by a space. Extract the text from sonnetsPreprocessed.txt, split the text into documents at newline characters, and then tokenize the documents.
bag — Input bag-of-n-grams model bagOfNgrams object
Input bag-of-n-grams model, specified as a bagOfNgrams object.
ngrams — N-grams to remove string array | character vector | cell array of character vectors
N-grams to remove, specified as a string array, character vector, or a
cell array of character vectors.
If ngrams is a string array or cell array, then it has size NumNgrams-by-maxN , where NumNgrams is the number of n-grams, and maxN is the length of the largest n-gram. If ngrams is a character vector, then it represents a single word (unigram).
The value of ngrams(i,j) is the jth word of the ith n-gram. If the number of words in the ith n-gram is less than maxN, then the remaining entries of the ith row of ngrams are empty.
Example: ["An" ""; "An example"; "example"
""]
Data Types: string | char | cell
idx — Indices of n-grams to remove vector of numeric indices | vector of logical indices
Indices of n-grams to remove, specified as a vector of numeric indices or
a vector of logical indices. The indices in idx
correspond to the rows of the bag.Ngrams.
You can also select a web site from the following list:
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.