- Process large text files to find unique words and their frequencies.
- Visually represent those word frequencies, there are thousands of unique words.
Large amount of text frequency representation visually
    8 ビュー (過去 30 日間)
  
       古いコメントを表示
    
I am working on text mining. Now i have some text files which contains millions of words. So i want to determine thier words frequncies. I have two probelms
- how to process large data in matlab for unique words findings and thier occurance for any text document(contains words in millions)
- after finding unique words and thier occurance how to represent them in circos/pi etc any graphical representation (as unique words can be in thousands)
0 件のコメント
回答 (1 件)
  Samayochita
 2025 年 6 月 18 日
        Hi moin khan,
I understand that while working on large-scale text mining in MATLAB, the goal is to: 
To efficiently process large text data in MATLAB:
Step 1: Read large files
Use memory-efficient reading using fileread or fopen and fscanf.
textData = fileread('largeTextFile.txt');  % Suitable for moderately large files
For very large files, prefer reading in chunks:
fid = fopen('largeTextFile.txt','r');
while ~feof(fid)
    line = fgetl(fid);
    % process line
end
fclose(fid);
Step 2: Tokenize text and clean it (optional but preferred)
Break the text into words, convert to lowercase, remove punctuation, etc.
cleanedText = lower(regexprep(textData, '[^\w\s]', ''));  % remove punctuation
words = split(cleanedText);  % tokenize
words = words(~cellfun('isempty',words));  % remove empty strings
Step 3: Count word frequencies
Use “unique” and “accumarray” functions OR “tabulate” function.
words = {'cat', 'dog', 'cat', 'bird', 'dog', 'cat'};
[uniqueWords, ~, idx] = unique(words);
counts = accumarray(idx, 1);
OR
words = {'cat', 'dog', 'cat', 'bird', 'dog', 'cat'};
T = tabulate(words)
Step 4: Visualize word frequencies using word cloud
Ideal to create a word cloud chart for hundreds or thousands of words.
wordcloud(uniqueWords, counts);
Please refer to the following documentation links for more information: 
 Hope this is helpful!
0 件のコメント
参考
カテゴリ
				Help Center および File Exchange で Graph and Network Algorithms についてさらに検索
			
	Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!

