Text mining with matlab of affiliation strings of a pubblication database

1 回表示 (過去 30 日間)

古いコメントを表示

pietro 2017 年 11 月 17 日

0
リンク

この質問への直接リンク

https://jp.mathworks.com/matlabcentral/answers/367728-text-mining-with-matlab-of-affiliation-strings-of-a-pubblication-database

閉鎖済み: John D'Errico 2017 年 11 月 18 日

image.png

Hi all,

I want to carry out an authorship analysis by means of complex networks. Therefore, I downloaded data from Scopus as CSV file. Each node (that is author) will be identified from the combination of name and affiliation code, which can be something like "University of London". Thus, the result is not biased from author of the same name. It is easy to extract the same author name but not that easy for the affiliation, because the affiliation strings have not any standard structure. They appear in many forms, like: "university of XXX…", "XXX university…", "Department of YYY…", acronym of the department, the address is not always included, etc. In few cases, the affiliations lack of details, therefore it is simply "university of XXX". This makes the rather challinging to assign to each affiliation string the affiliation code. I partially solved the problem using the following approach: 1- Manually definition a word bank for each affiliation, which can be (street name, city, acronym of the deparment, etc) 2- Separating each affiliation string in substrings of single words 3- Each substring set was compared with the word bank of each affiliation and likely the affiliation is the one where the intersection with the relative word bank is the largest.

Unfortunately, this approach doesn't work as good as expected, in many the affiliation code is wrongly assigned and it requires more manual work than I thought. So which can be an improved method than the adopted one?

Thank you

Best regards

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by

Text mining with matlab of affiliation strings of a pubblication database

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

回答 (0 件)

参考

タグ

製品

Community Treasure Hunt

Text mining with matlab of affiliation strings of a pubblication database

0 件のコメント -2 件の古いコメントを表示-2 件の古いコメントを非表示

回答 (0 件)

参考

タグ

製品

Community Treasure Hunt

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示