Does fitdist work for fitting a distribution to truncated data?
13 ビュー (過去 30 日間)
古いコメントを表示
Indeed, it looks like that the mean of the the fitting distribution and the mean of the truncated data are visually different (please see the following plot with the blue histogram and the red line).
Here my example:
% create a set of "truncated data"
pd = makedist('Normal','mu',3);
t = truncate(pd,3,inf);
data = random(t,10000,1);
% fit the normal distribution to the "truncated data"
pd_fit = fitdist(data,'normal');
xgrid = linspace(0,100,1000)';
mypdf = pdf(pd_fit,xgrid);
% plot the "truncated histogram" and the fitting distribution
hold on
histogram(data,100,'Normalization','pdf')
line(xgrid,mypdf,'Linewidth',2,'color','red')
hold off
xlim([0 10])
0 件のコメント
採用された回答
Angelo Yeo
2023 年 6 月 19 日
Of course it works for truncated data. You can see the mean of the data and the "mu" of "pd_fit" is same.
% create a set of "truncated data"
pd = makedist('Normal','mu',3);
t = truncate(pd,3,inf);
data = random(t,10000,1);
% fit the normal distribution to the "truncated data"
pd_fit = fitdist(data,'normal');
xgrid = linspace(0,100,1000)';
mypdf = pdf(pd_fit,xgrid);
% plot the "truncated histogram" and the fitting distribution
hold on
histogram(data,100,'Normalization','pdf')
line(xgrid,mypdf,'Linewidth',2,'color','red')
hold off
xlim([0 10])
%%
mean(data)
pd_fit.mu
2 件のコメント
Angelo Yeo
2023 年 6 月 20 日
It's not possible to get the blue curve in the picture below only with the truncated data.
From what feature can computer assume the parameters of the curve?
その他の回答 (1 件)
Ayush Kashyap
2023 年 6 月 19 日
Indeed, `fitdist` can be used to fit a statistical distribution to truncated data.
It is valid to fit a distribution to truncated data if we believe that the underlying distribution of the data follows a specific distribution, but the data is censored below or above certain values.
In your example, you are generating truncated random data using the normal distribution, truncating it from the left at 3, and then fitting the normal distribution to it. This is a valid approach if you believe that the underlying distribution of your data is normal, but that the data is censored on the lower end at 3.
The difference between the mean of the fitted distribution and the mean of the truncated data that you are observing in the plot could be due to several reasons such as:
- One possibility is that the truncation of the data has affected the mean, leading to a biased estimate of the mean from the truncated sample.
- Another possibility is that the normal distribution may not be a good fit for your data, which could lead to differences in the mean and other parameters of the distribution.
To address this issue,
- you could consider using alternative distributions that account for truncation or have heavier tails.
- You could also compare the fit of the normal distribution to alternative distributions using goodness-of-fit tests or other evaluation metrics.
- Additionally, you could consider using methods that are specifically designed for fitting distributions to truncated data, such as the method of moments or maximum likelihood estimation with modified likelihoods that account for truncation.
Reference Link: Documentation about fitdist
参考
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!