Calculating the Gaussian distribution paramaters
3 ビュー (過去 30 日間)
I'm trying do a small script to try the EM algorithm in which I have 2 sets of 1 dimension points that belong to 2 different guassians but I don't know which point belongs to which data set, and the EM algorithm estimates the gaussian parameters (mean,variance) for both.
For that I first create a small data set
data1 = normrnd(-6,3,[200 1]);
data2 = normrnd(6,1,[200 1]);
data = [data1;data2];
Then to compare the results outputed by the EM algoritm, I first calculate the gaussian distrubution parameters. However the result I get is slightly different if i use the matlab funtion fitdist or if I code the math it self: (left is manual math, right is fitdist)
Why is that?
The math I did was for mu and sigma:
The manual math is coded as:
distGauss1.mu = mean(data1);
distGauss1.sigma = mean((data1-distGauss1.mu).^2);
distGauss2.mu = mean(data2);
distGauss2.sigma = mean((data2-distGauss1.mu).^2);
dpb 2022 年 6 月 21 日
Let's try your formula with numbers...
>> data1 = normrnd(-6,3,[200 1]);
OK, that returns what we would expect, pretty close to the input parameters ot the RNG...
Now what does your calculation give...
Woops!!! You forgot two things -- first is
That's much closer, but still not quite the same identical answer as std returned -- but you used mean which divides by n and the unbiased estimator of the std uses n-1
So, as the LH plot shows, your distribution is much fatter than it should be...3X the width since the input sigma was 3. The result is much closer for the other as sqrt(1) --> 1 so the difference just doesn't show up numerically.