Frequency words for each labels
現在この質問をフォロー中です
- フォローしているコンテンツ フィードに更新が表示されます。
- コミュニケーション基本設定に応じて電子メールを受け取ることができます。
エラーが発生しました
ページに変更が加えられたため、アクションを完了できません。ページを再度読み込み、更新された状態を確認してください。
古いコメントを表示
I have one dataset with two columns: text and data. The data is made up two labels 0 and 1. I would like to calculate the frequency of each word for each labels. I mean, how many time, for example "damage" there is within class 1 and 0? How can I do? Furthermore, I don't understand if I have to, however, use tokens or no. Maybe I can use a cicle for? I don't know it.
Here there is a little image with a similar result. I would like a similar table.

採用された回答
Edit to make so that the code works with the latter added example data...
% read the file
data = readtable("dati_classificati.xlsx",'TextType','string');
% split each sentence into words, assuming that spaces are used as delimiter...
cell_text = arrayfun(@(x) data.text(x,:),1:size(data.text,1),'UniformOutput',false)';
cell_text = cellfun(@(x) split(x,' '), cell_text,'UniformOutput',false);
% count the number of words in each sentence
numWords = cellfun(@numel, cell_text);
% expand the labels to match the number of words for each sentence
expandedLabels = repelem( data.label ,numWords);
% gather the words in 1 big string array
expandedWords = vertcat(cell_text{:});
% list a few words to count the frequency...
MyWords = ["strada" "il" "Via" "donne" "della"];
% allocate a table for the results
varTypes = ["string","double","double"]; % data type for each column
varNames = ["Words","Ones","Zeros"]; % variable name for each column
MyResult = table('Size',[numel(MyWords) 3],'VariableTypes',varTypes,'VariableNames',varNames);
MyResult.Words = MyWords(:);
% count the labels for each word
for i = 1:numel(MyWords)
currLabels = expandedLabels( contains(expandedWords,MyResult.Words(i)) );
MyResult.Ones(i) = sum(currLabels==1);
MyResult.Zeros(i) = sum(currLabels==0);
end
% display the results
MyResult
MyResult = 5×3 table
Words Ones Zeros
________ ____ _____
"strada" 48 1
"il" 34 20
"Via" 53 0
"donne" 0 2
"della" 3 14
9 件のコメント
Rachele Franceschini
2022 年 7 月 7 日
I have several phrases and each one has label. I shouldn't before calculate the frequency of words?
Karim
2022 年 7 月 7 日
no, this is done automatically by
currLabels = RandomLabel(contains(BigList,MyResult.Words(i)));
This step will extract the labels only for the current word (or phrase, whatever is in the string).
Afterswards using
sum(currLabels==1)
the code counts the ocurrences of that word with the label (in this case "1")
If you need the total word frequency, you can just add the sum of the label 1 and 0
Rachele Franceschini
2022 年 7 月 7 日
I tried your code, but I have some problem. The script take all phrase and it doesnìt calculate the frequency.
Can you pinpoint the problem? You need to show your code and the error, otherwise it is not possible to help.
Rachele Franceschini
2022 年 7 月 7 日
I used your code. I put one image of the result. I tried also to put a pre-process for cleaning data. But I would like to get: how many time there is the word "ciao" within of classes 1 and 0 etc
% first gererate some random data..
MyWords = daticlassificati.text;
% now create a big list from the set of words
numItems = 1000;
BigList = MyWords ( randi(numel(MyWords),1,numItems) )
% crea un elenco con etichette casuali 0 o 1
RandomLabel = daticlassificati.label
uWords = unique(BigList);
% allocate a table for the results
varTypes = ["string","double","double"]; % data type for each column
varNames = ["Words","Ones","Zeros"]; % variable name for each column
MyResult = table('Size',[numel(uWords) 3],'VariableTypes',varTypes,'VariableNames',varNames);
MyResult.Words = uWords(:);
% count the labels for each word
for i = 1:numel(uWords)
currLabels = RandomLabel(contains(BigList,MyResult.Words(i)));
MyResult.Ones(i) = sum(currLabels==1);
MyResult.Zeros(i) = sum(currLabels==0);
end
% display the results
MyResult

I put my code with preprocess for cleaning dataset
% input file excel or text
filename = "dati_classificati.xlsx";
data = readtable(filename,'TextType','string');
% remove the rows of the table with empty reports (classify text data using deep learning)
idx = strlength(data.text) == 0;
data(idx,:) = [];
% read and next extract all raws of the colomn name (X)
textData = data.text;
% clean data (remove punctuation etc.)
Train_pr = preprocessText(textData);
Train_bag = bagOfWords(Train_pr)
Train_bag = removeInfrequentWords(Train_bag,5);
[Train_bag,idx] = removeEmptyDocuments(Train_bag);
Train_bag
tbl_train = topkwords(Train_bag,2000);
Karim
2022 年 7 月 7 日
what is inside "daticlassificati.text" can you also add the file? (use the paperclip button to add an external file)
why are you still doing these steps? This was only to generate random data, these are not needed.
numItems = 1000;
BigList = MyWords ( randi(numel(MyWords),1,numItems) )
In your case, i expect that you only need to do:
BigList = daticlassificati.text;
Rachele Franceschini
2022 年 7 月 7 日
Here I put file excel. Thank you for your help!
MyWords = daticlassificati.text;
label = daticlassificati.label;
Karim
2022 年 7 月 7 日
I modified the original answer accoring to the file you provided, see at the top. Note that i just used the raw text and only included a few words. But normally now you see how the concept works.
Rachele Franceschini
2022 年 7 月 7 日
VERY VERY thank you!!!!Thank you so much!!I tried also with pre-process and it is ok!
その他の回答 (0 件)
カテゴリ
ヘルプ センター および File Exchange で Text Data Preparation についてさらに検索
参考
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!Web サイトの選択
Web サイトを選択すると、翻訳されたコンテンツにアクセスし、地域のイベントやサービスを確認できます。現在の位置情報に基づき、次のサイトの選択を推奨します:
また、以下のリストから Web サイトを選択することもできます。
最適なサイトパフォーマンスの取得方法
中国のサイト (中国語または英語) を選択することで、最適なサイトパフォーマンスが得られます。その他の国の MathWorks のサイトは、お客様の地域からのアクセスが最適化されていません。
南北アメリカ
- América Latina (Español)
- Canada (English)
- United States (English)
ヨーロッパ
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)
