Accessing frequencies of arbitrarily selected histogram bins

12 ビュー (過去 30 日間)
Milos Krsmanovic
Milos Krsmanovic 2022 年 2 月 9 日
編集済み: Milos Krsmanovic 2022 年 2 月 9 日
I would like to access the frequencies of arbitrarily selected bins on my histogram. As in, say I want to sum up only the frequencies of bins 3 to 7. Or if I'd like to sum up frequencies only for the bins that have frequencies larger than a certain value. And so on.
I'm reading on histogram and histcount docs but I'm struggling on how to apply it to my problem. An old thread on Ask the Community discussing something similar suggests to bypass the histogram completely and do this on the raw data. Still, I'm wondering if there is a way to do it using histograms as I already have a bunch of them.
Thank you.

採用された回答

Steven Lord
Steven Lord 2022 年 2 月 9 日
Let's take a look at a sample histogram. I'm using rng default to create the sample data so the histogram created when you run this code exactly matches the one I create by running the code here in Answers.
rng default
x = randn(1, 100);
h = histogram(x)
h =
Histogram with properties: Data: [0.5377 1.8339 -2.2588 0.8622 0.3188 -1.3077 -0.4336 0.3426 3.5784 2.7694 -1.3499 3.0349 0.7254 -0.0631 0.7147 -0.2050 -0.1241 1.4897 1.4090 1.4172 0.6715 -1.2075 0.7172 1.6302 0.4889 1.0347 0.7269 -0.3034 0.2939 -0.7873 0.8884 … ] Values: [2 17 28 32 16 3 2] NumBins: 7 BinEdges: [-3 -2 -1 0 1 2 3 4] BinWidth: 1 BinLimits: [-3 4] Normalization: 'count' FaceColor: 'auto' EdgeColor: [0 0 0] Show all properties
yline(32, 'r:')
The Values properties are of particular interest. These are the actual counts (in this case, since the Normalization property is set to 'count'). How many bins contain more than 5 values?
nnz(h.Values > 5)
ans = 4
Which bin is the highest and what are the edges of that bin?
[maxHeight, maxLocation] = max(h.Values)
maxHeight = 32
maxLocation = 4
% Value(n) counts data falling between BinEdges(n) and BinEdges(n+1)
edgesOfMaxBin = h.BinEdges(maxLocation + [0, 1])
edgesOfMaxBin = 1×2
0 1
You can confirm by looking at the picture that this is correct. The red dotted line is at y = 32 and the top of the bin representing [0, 1) exactly touches that dotted line.
If you don't want the picture you can do these same types of operations on the outputs of histcounts. Some of the properties of the histogram object aren't returned by histcounts and so if you want them you'd need to compute them.
[theValues, theEdges] = histcounts(x)
theValues = 1×7
2 17 28 32 16 3 2
theEdges = 1×8
-3 -2 -1 0 1 2 3 4
theBinWidth = diff(theEdges) % Not returned by histcounts, but easy to compute
theBinWidth = 1×7
1 1 1 1 1 1 1
theNumBins = numel(theValues) % Ditto
theNumBins = 7
Instead of h.Values use theValues and instead of h.BinEdges use theEdges.

その他の回答 (2 件)

Paul
Paul 2022 年 2 月 9 日
The bin values are stored as propreties of the histogram object, which can be returned from histogram.
rng(101)
x = rand(20,1);
h = histogram(x);
h.Values
ans = 1×4
7 5 7 1
Or can be obtained from the histogram figure if its already created
h1 = get(gca,'Children');
h1.Values
ans = 1×4
7 5 7 1
  1 件のコメント
Steven Lord
Steven Lord 2022 年 2 月 9 日
Instead of getting all the axes Children, which may give you more than you expected, I'd use findobj or findall to specifically find the histogram handle.
rng default
x = randn(1, 100);
h = histogram(x);
yline(32, 'r:')
C = get(gca, 'Children')
C =
2×1 graphics array: ConstantLine Histogram
h2 = findobj(gca, 'Type', 'Histogram')
h2 =
Histogram with properties: Data: [0.5377 1.8339 -2.2588 0.8622 0.3188 -1.3077 -0.4336 0.3426 3.5784 2.7694 -1.3499 3.0349 0.7254 -0.0631 0.7147 -0.2050 -0.1241 1.4897 1.4090 1.4172 0.6715 -1.2075 0.7172 1.6302 0.4889 1.0347 0.7269 -0.3034 0.2939 -0.7873 0.8884 … ] Values: [2 17 28 32 16 3 2] NumBins: 7 BinEdges: [-3 -2 -1 0 1 2 3 4] BinWidth: 1 BinLimits: [-3 4] Normalization: 'count' FaceColor: 'auto' EdgeColor: [0 0 0] Show all properties
h == C % [false; true]
ans = 2×1 logical array
0 1
h == h2 % true
ans = logical
1

サインインしてコメントする。


Milos Krsmanovic
Milos Krsmanovic 2022 年 2 月 9 日
Thank you both for replying @Steven Lord and @Paul.
Ultimately I can select only one post as an answer but using your hints I was able to put together the code I needed. Some examples below, in case someone else needs this in the future.
1. Summing up of frequencies as per greater/less than criteria:
k = 0;
for i=1:nnz(histcounts(x))
if h.Values(i) >= 50
k = k + h.Values(i);
end
end
The same thing using an array:
j = [];
for i=1:nnz(histcounts(x))
if h.Values(i) >= 50
j = [j, h.Values(i)];
end
j = sum(j);
end
2. Selecting arbitrarily chosen bins, say bins 1, 5, 7, and 12:
m = [];
histx = histcounts(x);
for i=[1 5 7 12]
m = [m, histx(i)];
end
Members of m can now be used as needed, say summed up as in the second example above: m = sum(m);.
Thanks again, I do appreciate it.
  2 件のコメント
Steven Lord
Steven Lord 2022 年 2 月 9 日
If you've called histogram you don't need to also call histcounts. Your first example is more easily done with logical indexing
rng default
x = randn(1, 100);
h = histogram(x);
sum(h.Values(h.Values > 5))
ans = 93
Your second can use linear indexing.
nbins = numel(h.Values)
nbins = 7
valuesOfOddIndexBins = h.Values(1:2:nbins)
valuesOfOddIndexBins = 1×4
2 28 16 2
Milos Krsmanovic
Milos Krsmanovic 2022 年 2 月 9 日
編集済み: Milos Krsmanovic 2022 年 2 月 9 日
You live and you learn. Those are so neat. Thank you!
This also works just as expected: valuesOfArbitraryBins = h.Values([1 5 7 12]), that is, I do not have to follow a sequence.

サインインしてコメントする。

カテゴリ

Help Center および File ExchangeData Distribution Plots についてさらに検索

製品


リリース

R2019b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by