How to cluster similar strings?

6 ビュー (過去 30 日間)
Serbring
Serbring 2020 年 1 月 26 日
コメント済み: Serbring 2020 年 1 月 29 日
Hi all,
I have long lists of strings which I have automatically collected with a brute web scraping routine. However, many strings are pretty similar and I would like to reduce the length of the list by showing only the really different names. Is there any way, cluster together the strings? Below, you will find a sample of the list.
Thank you so much.
Best regards.
{'microbiologia agraria' }
{'microbiologia forestale e ambientale' }
{'microbiologia generale' }
{'microbiologia agraria' }
{'microbiologia generale e ambientale' }
{'microbiologia del suolo e del sottosuolo' }
{'nutrition and health: the functional foods'}
{'microbiologia generale e ambientale' }
{'microbial biotechnologies in agroforestry' }
{'microbiologia generale ed ambientale' }
{'microbiologia agraria e forestale' }

回答 (1 件)

Image Analyst
Image Analyst 2020 年 1 月 26 日
  1 件のコメント
Serbring
Serbring 2020 年 1 月 29 日
Thanks for your reply. I already knew those distances, but the real problem is how to deal with those number. I will try to be more specific, so that you will understand the basic idea of the algorithm I have developed.
Let's assume, I have three strings A, B and C. I computed the pair-wise distance between the strings (so:A - B, A-C, B-C), and then I summed the distance of one string with the other two (so A-B and A-C for A). Then, I don't have any idea on how to deal with those number. Any suggestion is appreciate.
Cheers
Michele

サインインしてコメントする。

カテゴリ

Help Center および File ExchangeLogical についてさらに検索

製品

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by