This is machine translation

Translated by Microsoft
Mouseover text to see original. Click the button below to return to the English version of the page.

Note: This page has been translated by MathWorks. Click here to see
To view all translated materials including this page, select Country from the country navigator on the bottom of this page.

join

Combine multiple bag-of-words or bag-of-n-grams models

Syntax

newBag = join(bag)
newBag = join(bag,dim)

Description

example

newBag = join(bag) combines the elements in the array bag by merging the frequency counts. The function combines the elements along the first dimension not equal to 1.

newBag = join(bag,dim) combines the elements in the array bag along the dimension dim.

Examples

collapse all

Create an array of two bags-of-words models from tokenized documents.

str = [ ...
    "an example of a short sentence"
    "a second short sentence"];
documents = tokenizedDocument(str);
bag(1) = bagOfWords(documents(1));
bag(2) = bagOfWords(documents(2))
bag = 
  1x2 bagOfWords array with properties:

    Counts
    Vocabulary
    NumWords
    NumDocuments

Combine the bag-of-words models using join.

bag = join(bag)
bag = 
  bagOfWords with properties:

          Counts: [2x7 double]
      Vocabulary: [1x7 string]
        NumWords: 7
    NumDocuments: 2

If your text data is contained in multiple files in a folder, then you can import the text data and create a bag-of-words model in parallel using parfor. If you have Parallel Computing Toolbox™ installed, then the parfor loop runs in parallel, otherwise, it runs in serial. Use join to combine an array of bag-of-words models into one model.

Create a bag-of-words model from a collection of files. The examples sonnets have file names "exampleSonnetN.txt", where N is the number of the sonnet. Get a list of the files and their locations using dir.

fileLocation = fullfile(matlabroot,'examples','textanalytics','exampleSonnet*.txt');
fileInfo = dir(fileLocation)
fileInfo = 5x1 struct array with fields:
    name
    folder
    date
    bytes
    isdir
    datenum

Initialize an empty bag-of-words model and then loop over the files and create an array of bag-of-words models.

bag = bagOfWords;

numFiles = numel(fileInfo);
parfor i = 1:numFiles
    f = fileInfo(i);
    filename = fullfile(f.folder,f.name);
    
    textData = extractFileText(filename);
    document = tokenizedDocument(textData);
    bag(i) = bagOfWords(document);
end
Starting parallel pool (parpool) using the 'local' profile ...
Connected to the parallel pool (number of workers: 12).

Combine the bag-of-words models using join.

bag = join(bag)
bag = 
  bagOfWords with properties:

          Counts: [5x3275 double]
      Vocabulary: [1x3275 string]
        NumWords: 3275
    NumDocuments: 5

Input Arguments

collapse all

Array of bag-of-words or bag-of-n-grams models, specified as a bagOfWords array or a bagOfNgrams array. If bag is a bagOfNgrams array, then each element to be joined must have the same value for the NgramLengths property.

Dimension along which to join models, specified as a positive integer. If dim is not specified, then the default is the first dimension with a size that does not equal 1.

Output Arguments

collapse all

Output model, returned as a bagOfWords object or a bagOfNgrams object. The type of newBag is the same as the type of bag. newBag has the same data type as the input model and has a size of 1 along the dimension being joined.

Introduced in R2018a