how to find the similarity between two text documents
    6 ビュー (過去 30 日間)
  
       古いコメントを表示
    
i have two text document.
For example, a.txt file contains ' Hai How R U'.
and b.txt file contains 'Hai How are U'.
How I can calculate the cosine similarity or Euclidean Distance for these two documents (text files).
thanks in advance.
2 件のコメント
  Jan
      
      
 2012 年 12 月 19 日
				The Euclidean Distance requires vektors of the same size. There are different Edit Distances, but I do not know the cosine distance. Perhaps it is better that you explain the details that that we search in WikiPedia.
  info info
 2020 年 3 月 20 日
				i think the best way to  give the similarity text is "shinling"
Shingling, a common technique of representing documents as sets. Given the document, its k-shingle is said to be all the possible consecutive substring of length k found within it. An example with k = 3 is given below :
## $Original
## [1] "The sky is blue and the sun is bright."
## 
## $Shingled
## [1] "the sky is"    "sky is blue"   "is blue and"   "blue and the" 
## [5] "and the sun"   "the sun is"    "sun is bright"
then we virify if find in our textes
##                doc_1 doc_2 doc_3
## the sky is         1     1     1
## sky is blue        1     0     1
## is blue and        1     0     0
## blue and the       1     0     0
## and the sun        1     0     0
## the sun is         1     0     0
## sun is bright      1     0     1
## the sun in         0     1     0
## sun in the         0     1     0
## in the sky         0     1     0
## sky is bright      0     1     0
## we can see         0     0     1
## can see sun        0     0     1
## see sun is         0     0     1
## is bright the      0     0     1
## bright the sky     0     0     1
then calculate .and take the big valeur
回答 (1 件)
  Jan
      
      
 2012 年 12 月 19 日
        Searching in the FEX is a good point to start from:
- http://www.mathworks.com/matlabcentral/fileexchange/32449-edit-distances
- http://www.mathworks.com/matlabcentral/fileexchange/39049-edit-distance-algorithm
- http://www.mathworks.com/matlabcentral/fileexchange/36981-find-nearest-matching-string-from-a-set
- http://www.mathworks.com/matlabcentral/fileexchange/213-editdist-m
- http://www.mathworks.com/matlabcentral/fileexchange/17585-calculation-of-distance-between-strings
0 件のコメント
参考
カテゴリ
				Help Center および File Exchange で Model Import についてさらに検索
			
	Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!


