HISTCOUNTS a true replacement of HISTC?

29 ビュー (過去 30 日間)
Bruno Luong
Bruno Luong 2018 年 10 月 10 日
編集済み: Bruno Luong 2018 年 10 月 11 日
Since few latest releases we get the code warning "histc is not recommended. Use histcounts instead".
However how to replace HISTC with specified DIM argument, for example what is the command to get C1 and C2 in this example?
I hope you won't tell me I need a for-loop or some sort disguised loop.
>> A=reshape(linspace(0,1,20),[4 5])
A =
0 0.2105 0.4211 0.6316 0.8421
0.0526 0.2632 0.4737 0.6842 0.8947
0.1053 0.3158 0.5263 0.7368 0.9474
0.1579 0.3684 0.5789 0.7895 1.0000
>> b=linspace(0,1,3)
b =
0 0.5000 1.0000
>> c1=histc(A,b,1)
c1 =
4 4 2 0 0
0 0 2 4 3
0 0 0 0 1
>> c2=histc(A,b,2)
c2 =
3 2 0
3 2 0
2 3 0
2 2 1
>>

採用された回答

Bruno Luong
Bruno Luong 2018 年 10 月 10 日
編集済み: Bruno Luong 2018 年 10 月 11 日
Third solution, may be the best in term of speed/memory
% replacement of [n,i] = histc(x,edges,dim);
[~,~,i] = histcounts(x, [edges,Inf]);
s = uint32(size(x));
nbins = uint32(length(edges));
p1 = uint32(prod(s(1:dim-1)));
p2 = uint32(prod(s(dim+1:end)));
if p1 == 1 % happens for dim==1
k = reshape(uint32(i),[s(dim),p2]);
l = nbins*(0:p2-1);
ilin = l + k;
else
j = reshape(1:p1,[p1,1,1]);
k = reshape(uint32(i-1)*p1,[p1,s(dim),p2]);
l = reshape((p1*nbins)*(0:p2-1),[1,1,p2]);
ilin = (j + l) + k;
end
ilin = ilin(:);
s(dim) = nbins;
n = accumarray(ilin(i~=0),1,[prod(s),1]);
n = reshape(n,s);
EDIT: slight improve code by (1) avoid reduce calculation for dim=1, (2) indexing cast to UINT32
  2 件のコメント
Bruno Luong
Bruno Luong 2018 年 10 月 10 日
編集済み: Bruno Luong 2018 年 10 月 10 日
Sorry, I must Accept my own answer but I think it's fair. Promise, I'll accept another solution if it's better.
Bruno Luong
Bruno Luong 2018 年 10 月 10 日
編集済み: Bruno Luong 2018 年 10 月 11 日
Timing of the above algo for arrays of 100x100x100
Working along dimension #1
Time HISTC = 15.549 [ms]
Time HISTCOUNTS = 19.481 [ms]
Penalty factor = 1.3
Working along dimension #2
Time HISTC = 20.104 [ms]
Time HISTCOUNTS = 16.454 [ms]
Penalty factor = 0.8
Working along dimension #3
Time HISTC = 21.446 [ms]
Time HISTCOUNTS = 18.485 [ms]
Penalty factor = 0.9
The test code (in function) is as following:
x = reshape(linspace(0,1,1e6),[100 100 100]);
edges = linspace(0,1,100);
t = zeros(2,ndims(x));
for dim=1:ndims(x)
tic
[nn,~] = histc(x,edges,dim);
t(1,dim) = toc;
tic
[~,~,i] = histcounts(x, [edges,Inf]);
s = uint32(size(x));
nbins = uint32(length(edges));
p1 = uint32(prod(s(1:dim-1)));
p2 = uint32(prod(s(dim+1:end)));
if p1 == 1
k = reshape(uint32(i),[s(dim),p2]);
l = reshape(nbins*(0:p2-1),1,[]);
ilin = l + k;
else
j = reshape(1:p1,[],1,1);
k = reshape(uint32(i-1)*p1,[p1,s(dim),p2]);
l = reshape((p1*nbins)*(0:p2-1),1,1,[]);
ilin = (j + l) + k;
end
ilin = ilin(:);
s(dim) = nbins;
n = accumarray(ilin(i~=0),1,[prod(s),1]);
n = reshape(n,s);
t(2,dim) = toc;
end
t = t*1e3;
for dim=1:ndims(x)
fprintf('Working along dimension #%d\n', dim);
fprintf('\tTime HISTC = %1.3f [ms]\n', t(1,dim));
fprintf('\tTime HISTCOUNTS = %1.3f [ms]\n', t(2,dim));
fprintf('\tPenalty factor = %1.1f\n', t(2,dim)/t(1,dim));
end

サインインしてコメントする。

その他の回答 (3 件)

Bruno Luong
Bruno Luong 2018 年 10 月 10 日
編集済み: Bruno Luong 2018 年 10 月 10 日
OK, here is the answer of my own question.
The replacement of
[n,i] = histc(x, edges, dim);
is
[~,~,i] = histcounts(x, edges);
s = size(x);
nd = length(s);
idx = arrayfun(@(n) 1:n, s, 'unif', 0);
[idx{:}] = ndgrid(idx{:});
idx{dim} = i;
s(dim) = length(edges);
idx = cat(nd+1,idx{:});
idx = reshape(idx,[],nd);
n = accumarray(idx(i~=0,:),1,s);
That's a lot of code. If an authority of TMW can shed a light if there is a shorter, faster, cleaner solution, I'm in.
NOTE: there is still a subtle difference caused by data right at the outer edge
  1 件のコメント
Stephen23
Stephen23 2018 年 10 月 10 日
Nice. This definitely wins the Answer of the Day award.
Perhaps the next step would be to make an enhancement request.

サインインしてコメントする。


OCDER
OCDER 2018 年 10 月 10 日
Interesting finding that histcounts doesn't have the Dim option. You should ask TMW to add that feature - they must have overlooked that.
"I hope you won't tell me I need a for-loop or some sort disguised loop."
But, there really is no need to avoid a simpler for-loop solution. Also, your answer has a "disguised loop" inside the arrayfun - arrayfun uses an internal for loop.
A =[ 0 0.2105 0.4211 0.6316 0.8421;
0.0526 0.2632 0.4737 0.6842 0.8947;
0.1053 0.3158 0.5263 0.7368 0.9474;
0.1579 0.3684 0.5789 0.7895 1.0000];
Bin = linspace(0, 1, 3);
C1 = histc(A, Bin, 1);
C2 = myHistC(A, Bin, 1);
D1 = histc(A, Bin, 2);
D2 = myHistC(A, Bin, 2);
assert(isequal(C1, C2) && isequal(D1, D2), 'Not the same thing');
function C = myHistC(A, Bin, Dim)
if nargin < 3; Dim = 1; end
if Dim == 1
C = zeros(numel(Bin), size(A, 2));
for j = 1:size(A, 2)
C(:, j) = histcounts(A(:, j), [Bin(:); Inf]);
end
else
C = zeros(size(A, 1), numel(Bin));
for j = 1:size(A, 1)
C(j, :) = histcounts(A(j, :), [Bin(:); Inf]);
end
end
end
  5 件のコメント
OCDER
OCDER 2018 年 10 月 10 日
I see. But if the suggestion is to use histc, I'm wondering why histcounts doesn't implement this directly using a name-value input like
histcounts(..., 'dim', N) ?
Overall, I find it confusing that they acknowledged the shortcoming of histcounts, offered a workaround code in the example but do not implement this within histcounts, and then, made histc generate warning messages to use histcounts instead, which has no replacement for histc's column/row-wise binning. It's an interesting decision.
From the website:
Use a for-loop to calculate bin counts over each column.
A = randn(100,10);
nbins = 10;
N = zeros(nbins, size(A,2));
for k = 1:size(A,2)
N(:,k) = histcounts(A(:,k),nbins);
end
Bruno Luong
Bruno Luong 2018 年 10 月 10 日
編集済み: Bruno Luong 2018 年 10 月 10 日
+1 OCDER
And the example in the document merely deals with a workaround for 2D case. It would be a nightmare for someone who is not familiar with MATLAB and find a decend alternative for nd-array case.

サインインしてコメントする。


Bruno Luong
Bruno Luong 2018 年 10 月 10 日
編集済み: Bruno Luong 2018 年 10 月 10 日
Another solution of replacement, still long, but consumes less memory than my first solution at the cost of array permutations:
% replacement of [n,i] = histc(x,edges,dim);
s = size(x);
nd = length(s);
p = 1:nd;
p([1 dim]) = p([dim 1]);
q(1:nd) = p;
xp = permute(x, p);
sp = s(p);
[~,~,i] = histcounts(xp, [edges, Inf]);
m = prod(sp(2:end));
sp1 = sp(1);
nbins = length(edges);
j = repmat(nbins*(0:m-1),[sp1,1]);
keep = i~=0;
linidx = i(keep) + j(keep);
sp(1) = nbins;
n = accumarray(linidx(:),1,[prod(sp) 1]);
n = permute(reshape(n,sp),q);
i = permute(i,q);

カテゴリ

Help Center および File ExchangeHypothesis Tests についてさらに検索

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by