how to find the similarity between two text documents

Question

Jothi 2012 年 12 月 19 日

0
リンク

この質問への直接リンク

https://jp.mathworks.com/matlabcentral/answers/56961-how-to-find-the-similarity-between-two-text-documents

コメント済み: info info 2020 年 3 月 20 日

i have two text document.

For example, a.txt file contains ' Hai How R U'.

and b.txt file contains 'Hai How are U'.

How I can calculate the cosine similarity or Euclidean Distance for these two documents (text files).

thanks in advance.

2 件のコメント
なしを表示なしを非表示

Jan 2012 年 12 月 19 日

The Euclidean Distance requires vektors of the same size. There are different Edit Distances, but I do not know the cosine distance. Perhaps it is better that you explain the details that that we search in WikiPedia.

info info 2020 年 3 月 20 日

i think the best way to give the similarity text is "shinling"

Shingling, a common technique of representing documents as sets. Given the document, its k-shingle is said to be all the possible consecutive substring of length k found within it. An example with k = 3 is given below :

## $Original

## [1] "The sky is blue and the sun is bright."

##

## $Shingled

## [1] "the sky is" "sky is blue" "is blue and" "blue and the"

## [5] "and the sun" "the sun is" "sun is bright"

then we virify if find in our textes

## doc_1 doc_2 doc_3

## the sky is 1 1 1

## sky is blue 1 0 1

## is blue and 1 0 0

## blue and the 1 0 0

## and the sun 1 0 0

## the sun is 1 0 0

## sun is bright 1 0 1

## the sun in 0 1 0

## sun in the 0 1 0

## in the sky 0 1 0

## sky is bright 0 1 0

## we can see 0 0 1

## can see sun 0 0 1

## see sun is 0 0 1

## is bright the 0 0 1

## bright the sky 0 0 1

then calculate .and take the big valeur

サインインしてコメントする。

サインインしてこの質問に回答する。

Answer 1

Jan 2012 年 12 月 19 日

2
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/56961-how-to-find-the-similarity-between-two-text-documents#answer_68920

Searching in the FEX is a good point to start from:

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

サインインしてコメントする。

how to find the similarity between two text documents

2 件のコメント
なしを表示なしを非表示

回答 (1 件)

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

参考

カテゴリ

タグ

Community Treasure Hunt

how to find the similarity between two text documents

2 件のコメント なしを表示なしを非表示

回答 (1 件)

0 件のコメント -2 件の古いコメントを表示-2 件の古いコメントを非表示

参考

カテゴリ

タグ

Community Treasure Hunt

2 件のコメント
なしを表示なしを非表示

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示