How to fit lognormal distribution to a dataset which contains some zero values?

Question

Payel 2023 年 8 月 10 日

0
リンク

この質問への直接リンク

https://jp.mathworks.com/matlabcentral/answers/2007242-how-to-fit-lognormal-distribution-to-a-dataset-which-contains-some-zero-values

編集済み: dpb 2023 年 8 月 13 日

4 件のコメント
2 件の古いコメントを表示2 件の古いコメントを非表示

Walter Roberson 2023 年 8 月 11 日

Exact zeros for rainfall values are common. The overall dataset cannot be lognormal. To get any further with a lognormal distribution you would have to start doing calculations based upon absolute humidity or relative humidity measured multiple times over the day so that you could calculate "available water"

Image Analyst 2023 年 8 月 12 日

Please upload a screenshot of your distribution plotted.

サインインしてコメントする。

サインインしてこの質問に回答する。

Answer 1

Walter Roberson 2023 年 8 月 10 日

0
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/2007242-how-to-fit-lognormal-distribution-to-a-dataset-which-contains-some-zero-values#answer_1286092

編集済み: Walter Roberson 2023 年 8 月 10 日

Don't do that?

There are a small number of possibilities in that situation:

That the log-normal distribution is just a wrong model for the system and you should be chosing a different model instead
That the zeros are place holders for errors in the data. In such a case those measurements should be removed before trying to fit the data
That the zeros are round-off for small measurements, perhaps due to limited precision of sensors. You will not be able to learn anything useful from those measurements, so you should remove them before trying to fit the data
That the zeros are caused by noise in the system. In such a case, log-normal model is not going to apply, but you might be able to obtain an approximation by removing the zeros (and negatives) before trying to fit the data
That the zeros are correct points, representing locations where the parameters are negative infinity. I would imagine that there are several papers to be written about the physics of such a system, which would probably have deep connections to Bose-Einsten Condensates and to Planck Distances...

1 件のコメント
-1 件の古いコメントを表示-1 件の古いコメントを非表示

John D'Errico 2023 年 8 月 10 日

編集済み: John D'Errico 2023 年 8 月 10 日

MATLAB Online で開く

Be careful.

If the zeros are just low values that were "rounded" off to zero, then simply removing them will be a problem. Essentially you are biasing the estimate, since they SHOULD have been really small values. You are now estimating the parameters of a censored sample.

If that is the case, then you probably need to use MLE for a left censored sample.

A comparable example might be to estimate the distribution parameters of a normal distribution, but where all of the negative numbers were simply discarded. For example:

n = 100000;
x = randn(n,1);
x(x<0) = [];
mean(x)
ans = 0.8001
var(x)
ans = 0.3656

As you should expect, any attempt to estimate the normal parameters (which here should be (0,1)) will fail, unless you treat this properly as a censored sample.

The point being, you want to understand where the zeros are coming from, and deal with them properly.

サインインしてコメントする。

Answer 2

dpb 2023 年 8 月 11 日

0
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/2007242-how-to-fit-lognormal-distribution-to-a-dataset-which-contains-some-zero-values#answer_1286692

編集済み: dpb 2023 年 8 月 13 日

One analysis technique for daily rainfall modeling divides the problem into two parts -- a "wet-day" model that predicts rainfall amounts for those days that rainfall occurs and an independent Markov or stochastic renewal model to predict the occurrence of the zero-rainfall days.

The Pearson Type-3 or the two-parameter gamma distributions have been able to do a reasonable job of modeling point location rainfall for wet-day amount predictions. There's extensive literature in the subject field...