The normalization of histcounts

Question

Sim 2023 年 8 月 4 日

0
リンク

この質問への直接リンク

https://jp.mathworks.com/matlabcentral/answers/2004812-the-normalization-of-histcounts

編集済み: Sim 2023 年 8 月 7 日

a.mat

I would like to get the probability density function (PDF) from an array of data A (contained in the attached "a.mat" file).

If I understood correctly, if I use the normalization option called "probability", I would get the "relative frequency histogram".

Instead, if I use the normalization option called "pdf", I would get an "empirical estimate of the Probability Density Function".

However, when I check the sum of the probabilities ,I get "1" if I use the "probability" option, but I do not get "1" if I use the "pdf" option:

load('a.mat', 'A')
num_bins = 70;
B = histcounts(A,num_bins,'Normalization','probability'); 
sum(B)
ans = 1
C = histcounts(A,num_bins,'Normalization','pdf'); 
sum(C)
ans = 3.0030e-04

Shouldn't "sum(B)" give the sum of the relative frequencies, and "sum(C)" the sum of the the blocks' areas representing percentages?

What did I do wrong?

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

サインインしてコメントする。

サインインしてこの質問に回答する。

Answer 1

Steven Lord 2023 年 8 月 4 日

1
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/2004812-the-normalization-of-histcounts#answer_1283477

MATLAB Online で開く

For probability, each element in the output is the number of elements in the input that fall into that bin divided by the total number of elements in the input. So if you sum the elements in the output, what you get is the total number of elements in the input that fall into any of the bins divided by the total number. That's why its row in the table in the description of the 'Normalization' name-value argument says "The sum of the bin values is less than or equal to 1." It can be less than 1 if the 'BinLimits' or 'BinEdges' that you specified exclude one or more of the points in the input from being assigned into any of the bins, for example.

For pdf, each element in the output is the number of elements in the input that fall into that bin divided by the product of the width of the bin and the total number of elements in the input. If each of your bins were 1 unit wide, the 'pdf' and the 'probability' would be the same. If each of your bins were 0.1 units wide, each element in the output normalized by 'pdf' would be ten times as large as the corresponding element in the output normalized by 'probability' and if I summed the output of 'pdf' normalization I'd expect to get a result of 10.

x = randn(1, 1e5);
prob_BW1 = histcounts(x, 'BinWidth', 1, 'Normalization', 'probability');
pdf_BW1 = histcounts(x, 'BinWidth', 1, 'Normalization', 'pdf');
prob_BWtenth = histcounts(x, 'BinWidth', 0.1, 'Normalization', 'probability');
pdf_BWtenth = histcounts(x, 'BinWidth', 0.1, 'Normalization', 'pdf');
format longg
shouldBeSame = [prob_BW1.', pdf_BW1.']
shouldBeSame = 10×2
                     2e-05                     2e-05
                   0.00143                   0.00143
                    0.0215                    0.0215
                   0.13714                   0.13714
                   0.34063                   0.34063
                   0.34103                   0.34103
                   0.13565                   0.13565
                   0.02121                   0.02121
                   0.00136                   0.00136
                     3e-05                     3e-05
BWtenth_results = [prob_BWtenth; pdf_BWtenth; pdf_BWtenth./prob_BWtenth].'
BWtenth_results = 83×3
                     2e-05      0.000200000000000001                        10
                     3e-05      0.000299999999999998          9.99999999999995
                     5e-05      0.000500000000000002                        10
                     3e-05                    0.0003          9.99999999999999
                     6e-05                    0.0006          9.99999999999999
                     6e-05                    0.0006          9.99999999999999
                     8e-05      0.000799999999999999          9.99999999999999
                    0.0002       0.00200000000000001                        10
                   0.00021                    0.0021          9.99999999999999
                    0.0003                     0.003          9.99999999999999

All the elements in the third column of BWtenth_results are either 10 (or close to it) or NaN (if there's no data in x that fell into that particular bin.)

And as I said above, the sum of the probabilities is 1 but the sum of the PDF values is 10 because the bin width was 1/10.

[sum(prob_BWtenth), sum(pdf_BWtenth)]
ans = 1×2
     1    10

All those calculations I did assumed that the bin width was the same for each bin. If your bins had different widths (because you selected a non-uniformly spaced set of BinEdges) then the equivalent of the third column of BWtenth_results for that set of bins would reflect the spacing for each different bin.

1 件のコメント
-1 件の古いコメントを表示-1 件の古いコメントを非表示

Sim 2023 年 8 月 7 日

編集済み: Sim 2023 年 8 月 7 日

Many thanks @Steven Lord for your detailed answer!!

サインインしてコメントする。

Answer 2

the cyclist 2023 年 8 月 4 日

1
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/2004812-the-normalization-of-histcounts#answer_1283472

編集済み: the cyclist 2023 年 8 月 4 日

PDF is the probability density, not the probability. To get the probability for a given bin, you need to multiply by the bin width.

Your sum of C does not take that into account. MATLAB's "probability" normalization (your B calculation) is doing that for you.

1 件のコメント
-1 件の古いコメントを表示-1 件の古いコメントを非表示

Sim 2023 年 8 月 7 日

Many thanks @the cyclist for your great answer!! I would accept both answers, but I guess I need to accept only one of them......

サインインしてコメントする。

The normalization of histcounts

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

採用された回答

1 件のコメント
-1 件の古いコメントを表示-1 件の古いコメントを非表示

その他の回答 (1 件)

1 件のコメント
-1 件の古いコメントを表示-1 件の古いコメントを非表示

参考

カテゴリ

タグ

Community Treasure Hunt

The normalization of histcounts

0 件のコメント -2 件の古いコメントを表示-2 件の古いコメントを非表示

採用された回答

1 件のコメント -1 件の古いコメントを表示-1 件の古いコメントを非表示

その他の回答 (1 件)

1 件のコメント -1 件の古いコメントを表示-1 件の古いコメントを非表示

参考

カテゴリ

タグ

Community Treasure Hunt

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

1 件のコメント
-1 件の古いコメントを表示-1 件の古いコメントを非表示

1 件のコメント
-1 件の古いコメントを表示-1 件の古いコメントを非表示