Frequency words for each labels
3 ビュー (過去 30 日間)
古いコメントを表示
I have one dataset with two columns: text and data. The data is made up two labels 0 and 1. I would like to calculate the frequency of each word for each labels. I mean, how many time, for example "damage" there is within class 1 and 0? How can I do? Furthermore, I don't understand if I have to, however, use tokens or no. Maybe I can use a cicle for? I don't know it.
Here there is a little image with a similar result. I would like a similar table.
0 件のコメント
採用された回答
Karim
2022 年 7 月 7 日
編集済み: Karim
2022 年 7 月 7 日
Edit to make so that the code works with the latter added example data...
% read the file
data = readtable("dati_classificati.xlsx",'TextType','string');
% split each sentence into words, assuming that spaces are used as delimiter...
cell_text = arrayfun(@(x) data.text(x,:),1:size(data.text,1),'UniformOutput',false)';
cell_text = cellfun(@(x) split(x,' '), cell_text,'UniformOutput',false);
% count the number of words in each sentence
numWords = cellfun(@numel, cell_text);
% expand the labels to match the number of words for each sentence
expandedLabels = repelem( data.label ,numWords);
% gather the words in 1 big string array
expandedWords = vertcat(cell_text{:});
% list a few words to count the frequency...
MyWords = ["strada" "il" "Via" "donne" "della"];
% allocate a table for the results
varTypes = ["string","double","double"]; % data type for each column
varNames = ["Words","Ones","Zeros"]; % variable name for each column
MyResult = table('Size',[numel(MyWords) 3],'VariableTypes',varTypes,'VariableNames',varNames);
MyResult.Words = MyWords(:);
% count the labels for each word
for i = 1:numel(MyWords)
currLabels = expandedLabels( contains(expandedWords,MyResult.Words(i)) );
MyResult.Ones(i) = sum(currLabels==1);
MyResult.Zeros(i) = sum(currLabels==0);
end
% display the results
MyResult
9 件のコメント
Karim
2022 年 7 月 7 日
I modified the original answer accoring to the file you provided, see at the top. Note that i just used the raw text and only included a few words. But normally now you see how the concept works.
その他の回答 (0 件)
参考
カテゴリ
Help Center および File Exchange で Data Type Conversion についてさらに検索
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!