Large amount of text frequency representation visually

8 ビュー (過去 30 日間)
moin khan
moin khan 2021 年 3 月 23 日
回答済み: Samayochita 2025 年 6 月 18 日
I am working on text mining. Now i have some text files which contains millions of words. So i want to determine thier words frequncies. I have two probelms
  1. how to process large data in matlab for unique words findings and thier occurance for any text document(contains words in millions)
  2. after finding unique words and thier occurance how to represent them in circos/pi etc any graphical representation (as unique words can be in thousands)

回答 (1 件)

Samayochita
Samayochita 2025 年 6 月 18 日
Hi moin khan,
I understand that while working on large-scale text mining in MATLAB, the goal is to:
  1. Process large text files to find unique words and their frequencies.
  2. Visually represent those word frequencies, there are thousands of unique words.
To efficiently process large text data in MATLAB:
Step 1: Read large files
Use memory-efficient reading using fileread or fopen and fscanf.
textData = fileread('largeTextFile.txt'); % Suitable for moderately large files
For very large files, prefer reading in chunks:
fid = fopen('largeTextFile.txt','r');
while ~feof(fid)
line = fgetl(fid);
% process line
end
fclose(fid);
Step 2: Tokenize text and clean it (optional but preferred)
Break the text into words, convert to lowercase, remove punctuation, etc.
cleanedText = lower(regexprep(textData, '[^\w\s]', '')); % remove punctuation
words = split(cleanedText); % tokenize
words = words(~cellfun('isempty',words)); % remove empty strings
Step 3: Count word frequencies
Use “unique” and “accumarray” functions OR “tabulate” function.
words = {'cat', 'dog', 'cat', 'bird', 'dog', 'cat'};
[uniqueWords, ~, idx] = unique(words);
counts = accumarray(idx, 1);
OR
words = {'cat', 'dog', 'cat', 'bird', 'dog', 'cat'};
T = tabulate(words)
Step 4: Visualize word frequencies using word cloud
Ideal to create a word cloud chart for hundreds or thousands of words.
wordcloud(uniqueWords, counts);
Please refer to the following documentation links for more information:
Hope this is helpful!

カテゴリ

Help Center および File ExchangeGraph and Network Algorithms についてさらに検索

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by