Use Kernel Density Estimation to get the probability of a new observation

12 ビュー (過去 30 日間)
Tatiana Lopez
Tatiana Lopez 2015 年 4 月 1 日
コメント済み: Tatiana Lopez 2015 年 4 月 1 日
Hi all, I would like to use KDE to fit a 1d variable and then getting the probability of a new observation given the fitted model using pdf( kde, new observation ). I used fitdist with the 'Kernel' option and then plotted the corresponding pdf. My question is, shouln't the pdf be normalized? How can I get the probability of a new observation then?
Thank you
x = [15.276,15.277,15.279,15.28,15.281,15.186,15.187,15.188,15.19,15.191,15.193,15.194,15.195,15.197,15.198,15.199,15.201,15.202,15.204,15.205,15.206,15.208,15.209,15.211,15.212,15.213,15.215,15.216,15.217,15.219,15.22,15.222,15.223,15.224,15.226,15.227,15.229,15.23,15.231,15.233,15.234,15.236,15.237,15.238,15.24,15.241,15.243,15.244,15.245,15.247,15.248,15.249,15.251,15.252,15.254,15.255,15.283,15.284,15.286,15.287,15.288,15.29,15.291,15.172,15.173,15.174,15.176,15.177,15.179,15.18,15.181,15.183,15.184,15.293,15.294,15.167,15.169,15.17,15.295,15.297,15.298,15.299,15.301,15.302,15.304,15.305,15.306,15.308,15.309,15.311,15.312,15.313,15.315,15.316,15.145,15.147,15.148,15.149,15.151,15.152,15.154,15.155,15.156,15.158,15.159,15.161,15.162,15.163,15.165,15.166,15.293,15.294,15.167,15.317,15.319,15.32,15.322,15.323,15.324,15.326,15.327,15.329,15.33,15.331,15.333,15.334,15.336,15.337,15.338,15.34,15.341,15.343,15.344,15.345,15.347,15.348,15.349,15.351,15.352,15.354,15.355,15.356,15.358,15.359,15.361,15.362,15.363,15.365,15.366,15.368]';
t = max( min(x) - 0.1, 0 ) : 0.01 : min( max(x) + 0.1, 24);
kde = fitdist(x,'Kernel','Kernel','normal','Support','positive')
pdf_kde = pdf( kde, t );
plot(t, pdf_kde, 'g-'), hold on;

採用された回答

Brendan Hamm
Brendan Hamm 2015 年 4 月 1 日
The pdf integrates to be 1, so I am not sure why you think it needs to be normalized? Furthermore, this gives you a continuous distribution and therefore P(x|distribution) = 0 for all x, as points have no mass. the pdf is NOT telling you a probability, but rather just the probability density function evaluation at this point. Probability is the area under the curve between two values (the limits of the integral). You could ask, "what is the probability a new observation is less than or equal to 15.2?" and the answer would be found from the cdf:
cdf(kde,15.2) % P(x <= 15.2)
ans =
0.2727
or, "What is the probability the sample will be less than 15.4 but greater than 15.2?":
cdf(kde,15.4) - cdf(kde,15.2)% P(15.2 < x < 15.4) = P(x < 15.4) - P(x < 15.2);
ans =
0.7105
I hope this helps clear this up.
  1 件のコメント
Tatiana Lopez
Tatiana Lopez 2015 年 4 月 1 日
Thank you very much Brendan for your super fast answer. Now it's very clear!

サインインしてコメントする。

その他の回答 (0 件)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by