Cosine Similarity using BERT

Question

Nicholas Ang 2021 年 6 月 30 日

0
リンク

この質問への直接リンク

https://jp.mathworks.com/matlabcentral/answers/868608-cosine-similarity-using-bert

コメント済み: Nicholas Ang 2021 年 6 月 30 日

採用された回答: Divyam Gupta

I am using BERT to calculate similarities in Question Answering. I have encoded my Question data using

data.Tokens = encode(mdl.Tokenizer,data.Questions) which returns me a cell array.

Next, I proceeded to encode new text to test the similiarity with the already encoded Questions in the database: testTokens = encode(mdl.Tokenizer,text)

However, I am imable to use the cosineSimilarity(data.Tokens,testTokens) and I receive an error that says:

Input must be a matrix, a tokenizedDocument array, a bagOfWords model, a bagOfNgrams model, a string array of words, or a cell array of character vectors.

Do I need padding here or reshape of my cell vectors?

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

サインインしてコメントする。

サインインしてこの質問に回答する。

Answer 1

Divyam Gupta 2021 年 6 月 30 日

1
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/868608-cosine-similarity-using-bert#answer_736543

Hi Nicholas, I notice that you're facing an issue while computing the cosine similarity using a text encoder. As per the documentation mentioned at https://www.mathworks.com/help/textanalytics/ref/cosinesimilarity.html#d123e8335 the cosineSimilarity function takes a matrix to compute the similarity between two documents.

Since the encoded vector sizes for each of the questions is different, constructing a matrix might be difficult. You can do a pairwise comparision between the data.Tokens and the testTokens to compute the similarities. This can be achieved by running a nested loop while simultaneously storing the similarity scores.

Hope this helps.

1 件のコメント
-1 件の古いコメントを表示-1 件の古いコメントを非表示

Nicholas Ang 2021 年 6 月 30 日

Thank you! This worked!

サインインしてコメントする。

Cosine Similarity using BERT

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

採用された回答

1 件のコメント
-1 件の古いコメントを表示-1 件の古いコメントを非表示

その他の回答 (0 件)

参考

カテゴリ

タグ

製品

リリース

Community Treasure Hunt

Cosine Similarity using BERT

0 件のコメント -2 件の古いコメントを表示-2 件の古いコメントを非表示

採用された回答

1 件のコメント -1 件の古いコメントを表示-1 件の古いコメントを非表示

その他の回答 (0 件)

参考

カテゴリ

タグ

製品

リリース

Community Treasure Hunt

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

1 件のコメント
-1 件の古いコメントを表示-1 件の古いコメントを非表示