code with the same function

The code below gives me the right amount of how many times a letter repeats itself in a large text.txt.
I wanted another simple code, but that would do the same thing as this, in case it gave me the number of letters in a text (A = number of letters a, B = number of letters b and so on.)
if there is no simpler than this, accept another more complicated or the same level of difficulty.
fileread('mytextfile.txt')
data = fileread('mytextfile.txt');
nnz(data=='A')
nnz(ismember(data,'A'))

回答 (2 件)

Walter Roberson
Walter Roberson 2019 年 4 月 3 日

1 投票

[A, ~, AA] = unique(data);
fprintf('%c = %d\n', [A, accumarray(AA, 1)].')

8 件のコメント

Walter Roberson
Walter Roberson 2019 年 4 月 3 日
Gabriel Cunha
Gabriel Cunha 2019 年 4 月 3 日
編集済み: per isakson 2019 年 4 月 4 日
an error appeared:
Error using horzcat
Dimensions of arrays being concatenated are not consistent.
Rik
Rik 2019 年 4 月 3 日
It is a bit easier to resolve the error in his previous answer:
%random test data instead of fileread:
%data=char(randi([64 65+25],1,40));data(data==64)=' ';
data = fileread('mytextfile.txt');
[a, ~, aa] = find(accumarray(reshape(double(data),[],1), 1));
fprintf('%c = %d\n', [a(:).'; aa(:).']);
Walter Roberson
Walter Roberson 2019 年 4 月 3 日
編集済み: Walter Roberson 2019 年 4 月 4 日
fprintf('%c = %d\n', [0+A(:), accumarray(AA, 1)].')
Rik
Rik 2019 年 4 月 4 日
Curiously, this doesn't seem to work for documents as large as a Bible translation (which seems to be the goal). I have attached a public domain translation for testing. Notice the difference between the two methods for lower case common letters. The accumarray seems to cap out at 65535.
data=fileread('WEB.txt');
clc
[A, ~, AA] = unique(data);
fprintf('%c = %d\n', [A(:), accumarray(AA, 1)].')
char_list=min(data):max(data);
counts=histc(data,char_list);
char_list(counts==0)=[];
counts(counts==0)=[];
fprintf('%c = %d\n', [char_list',counts'].')
Walter Roberson
Walter Roberson 2019 年 4 月 4 日
double(char_list).'
Otherwise the char data type has priority over numeric in determining the data type of the concatenation.
Rik
Rik 2019 年 4 月 4 日
Despite of its name, char_list is already a double. I didn't notice your last edit with 0+A(:), so that is why that method is capped (as chars are capped to 16 bit).
Walter Roberson
Walter Roberson 2019 年 4 月 4 日
I did the 0+ after you (correctly) mentioned about the 65535.
Rik
Rik 2019 年 4 月 3 日
編集済み: Rik 2019 年 4 月 4 日

1 投票

There are two easy options: a loop and a histogram:
%for loop method:
data = fileread('mytextfile.txt');
letters='ABCDEFGHIJKLMNOPQRSTUVWXYZ';
counts=zeros(1,numel(letters));
for n=1:numel(letters)
counts(n)=nnz(data==letters(n));
end
%histogram method:
data = fileread('mytextfile.txt');
counts=histc(data,65:(65+25));

4 件のコメント

Gabriel Cunha
Gabriel Cunha 2019 年 4 月 4 日
Sorry for the doubt that you must be a beast, but I am a beginner in MATLAB, but what is the 65 and 0 25 in the histogram?
Rik
Rik 2019 年 4 月 4 日
Those are the ASCII value of A and the number letters in the alphabet (minus 1). But you should probably be using something like this:
char_list=min(data):max(data);
counts=histc(data,char_list);
char_list(counts==0)=[];
counts(counts==0)=[];
fprintf('%c = %d\n', [char_list',counts'].')
Gabriel Cunha
Gabriel Cunha 2019 年 4 月 4 日
Your code is really incredible, but I also wanted something as simple as the code of my question that counted one letter at a time, but I will certainly study your code as well as the others who answered in order to learn more about MATLAB
Rik
Rik 2019 年 4 月 4 日
The edited for-loop method should be a bit easier to understand.

この質問は閉じられています。

質問済み:

2019 年 4 月 3 日

閉鎖済み:

2021 年 8 月 20 日

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by