code with the same function
情報
この質問は閉じられています。 編集または回答するには再度開いてください。
古いコメントを表示
The code below gives me the right amount of how many times a letter repeats itself in a large text.txt.
I wanted another simple code, but that would do the same thing as this, in case it gave me the number of letters in a text (A = number of letters a, B = number of letters b and so on.)
if there is no simpler than this, accept another more complicated or the same level of difficulty.
fileread('mytextfile.txt')
data = fileread('mytextfile.txt');
nnz(data=='A')
nnz(ismember(data,'A'))
0 件のコメント
回答 (2 件)
Walter Roberson
2019 年 4 月 3 日
[A, ~, AA] = unique(data);
fprintf('%c = %d\n', [A, accumarray(AA, 1)].')
8 件のコメント
Walter Roberson
2019 年 4 月 3 日
Note that I had already answered you on this matter at https://www.mathworks.com/matlabcentral/answers/453555-help-me-please-please?s_tid=prof_contriblnk#answer_368356
Gabriel Cunha
2019 年 4 月 3 日
編集済み: per isakson
2019 年 4 月 4 日
Rik
2019 年 4 月 3 日
It is a bit easier to resolve the error in his previous answer:
%random test data instead of fileread:
%data=char(randi([64 65+25],1,40));data(data==64)=' ';
data = fileread('mytextfile.txt');
[a, ~, aa] = find(accumarray(reshape(double(data),[],1), 1));
fprintf('%c = %d\n', [a(:).'; aa(:).']);
Walter Roberson
2019 年 4 月 3 日
編集済み: Walter Roberson
2019 年 4 月 4 日
fprintf('%c = %d\n', [0+A(:), accumarray(AA, 1)].')
Rik
2019 年 4 月 4 日
Curiously, this doesn't seem to work for documents as large as a Bible translation (which seems to be the goal). I have attached a public domain translation for testing. Notice the difference between the two methods for lower case common letters. The accumarray seems to cap out at 65535.
data=fileread('WEB.txt');
clc
[A, ~, AA] = unique(data);
fprintf('%c = %d\n', [A(:), accumarray(AA, 1)].')
char_list=min(data):max(data);
counts=histc(data,char_list);
char_list(counts==0)=[];
counts(counts==0)=[];
fprintf('%c = %d\n', [char_list',counts'].')
Walter Roberson
2019 年 4 月 4 日
double(char_list).'
Otherwise the char data type has priority over numeric in determining the data type of the concatenation.
Rik
2019 年 4 月 4 日
Despite of its name, char_list is already a double. I didn't notice your last edit with 0+A(:), so that is why that method is capped (as chars are capped to 16 bit).
Walter Roberson
2019 年 4 月 4 日
I did the 0+ after you (correctly) mentioned about the 65535.
There are two easy options: a loop and a histogram:
%for loop method:
data = fileread('mytextfile.txt');
letters='ABCDEFGHIJKLMNOPQRSTUVWXYZ';
counts=zeros(1,numel(letters));
for n=1:numel(letters)
counts(n)=nnz(data==letters(n));
end
%histogram method:
data = fileread('mytextfile.txt');
counts=histc(data,65:(65+25));
4 件のコメント
Gabriel Cunha
2019 年 4 月 4 日
Rik
2019 年 4 月 4 日
Those are the ASCII value of A and the number letters in the alphabet (minus 1). But you should probably be using something like this:
char_list=min(data):max(data);
counts=histc(data,char_list);
char_list(counts==0)=[];
counts(counts==0)=[];
fprintf('%c = %d\n', [char_list',counts'].')
Gabriel Cunha
2019 年 4 月 4 日
Rik
2019 年 4 月 4 日
The edited for-loop method should be a bit easier to understand.
この質問は閉じられています。
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!