Difference between using "count" and "bag-of-words" in LSA?

1 回表示 (過去 30 日間)
anand samra
anand samra 2021 年 3 月 10 日
回答済み: Tarunbir Gambhir 2021 年 3 月 15 日
I'm just getting into NLP and dimensionality reduction and am working with LSA. As per the documentation here, fitLSA can be done using a bag of words or a count matrix. I'm not sure what the difference between using a count matrix or a bag-of-words object would do is though. I've run it twice using the bag of words object for one model, and then for a second model i used the Counts property of the bag of-words object. I then compared the component weights and found no difference. So, whats the difference between using a bag object or just a term-frequency matrix?

回答 (1 件)

Tarunbir Gambhir
Tarunbir Gambhir 2021 年 3 月 15 日
The two approaches to use fitLSA function will provide the same results if the inherent data being used is the same. Note that bag-of-words and count matrix are just ways to represent the input data so that it can be interpreted by the algorithm. Two different approaches provide flexibility for the user trying to work with this function.
In this case, bag-of-words and count matrix both give the count of the words in the documents but in different formats. One uses a bagOfWords object, whereas the other uses a matrix of nonnegative integers to represent this information.

カテゴリ

Help Center および File ExchangeGraphics Object Programming についてさらに検索

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by