フィルターのクリア

How to fit lognormal distribution to a dataset which contains some zero values?

43 ビュー (過去 30 日間)
Payel
Payel 2023 年 8 月 10 日
編集済み: dpb 2023 年 8 月 13 日
How to fit lognormal distribution to a dataset which contains some zero values?
  4 件のコメント
Walter Roberson
Walter Roberson 2023 年 8 月 11 日
Exact zeros for rainfall values are common. The overall dataset cannot be lognormal. To get any further with a lognormal distribution you would have to start doing calculations based upon absolute humidity or relative humidity measured multiple times over the day so that you could calculate "available water"
Image Analyst
Image Analyst 2023 年 8 月 12 日
Please upload a screenshot of your distribution plotted.

サインインしてコメントする。

回答 (2 件)

Walter Roberson
Walter Roberson 2023 年 8 月 10 日
編集済み: Walter Roberson 2023 年 8 月 10 日
Don't do that?
There are a small number of possibilities in that situation:
  1. That the log-normal distribution is just a wrong model for the system and you should be chosing a different model instead
  2. That the zeros are place holders for errors in the data. In such a case those measurements should be removed before trying to fit the data
  3. That the zeros are round-off for small measurements, perhaps due to limited precision of sensors. You will not be able to learn anything useful from those measurements, so you should remove them before trying to fit the data
  4. That the zeros are caused by noise in the system. In such a case, log-normal model is not going to apply, but you might be able to obtain an approximation by removing the zeros (and negatives) before trying to fit the data
  5. That the zeros are correct points, representing locations where the parameters are negative infinity. I would imagine that there are several papers to be written about the physics of such a system, which would probably have deep connections to Bose-Einsten Condensates and to Planck Distances...
  1 件のコメント
John D'Errico
John D'Errico 2023 年 8 月 10 日
編集済み: John D'Errico 2023 年 8 月 10 日
Be careful.
If the zeros are just low values that were "rounded" off to zero, then simply removing them will be a problem. Essentially you are biasing the estimate, since they SHOULD have been really small values. You are now estimating the parameters of a censored sample.
If that is the case, then you probably need to use MLE for a left censored sample.
A comparable example might be to estimate the distribution parameters of a normal distribution, but where all of the negative numbers were simply discarded. For example:
n = 100000;
x = randn(n,1);
x(x<0) = [];
mean(x)
ans = 0.8001
var(x)
ans = 0.3656
As you should expect, any attempt to estimate the normal parameters (which here should be (0,1)) will fail, unless you treat this properly as a censored sample.
The point being, you want to understand where the zeros are coming from, and deal with them properly.

サインインしてコメントする。


dpb
dpb 2023 年 8 月 11 日
編集済み: dpb 2023 年 8 月 13 日
One analysis technique for daily rainfall modeling divides the problem into two parts -- a "wet-day" model that predicts rainfall amounts for those days that rainfall occurs and an independent Markov or stochastic renewal model to predict the occurrence of the zero-rainfall days.
The Pearson Type-3 or the two-parameter gamma distributions have been able to do a reasonable job of modeling point location rainfall for wet-day amount predictions. There's extensive literature in the subject field...

製品


リリース

R2022a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by