How can I know which ditribution is appropriate to fit on the generated histogram? and, How can I do that?

7 ビュー (過去 30 日間)
Hi all,
Using the code below, I create a histogram of soil pors' diameter. My question is that how I can recognize which ditribution can fits to that very well and, how I can handle it in MATLAB.
clc
clear
close all
load("Diameter.mat");
load("Number_Pores.mat")
Diameter=flip(Diameter)';
N=flip(N)';
% use diameter vector as bin edges and convert it to mm, since pors are in
% mm
bin_edge = Diameter*1e-6;
% Corresponding frequencies for each bin
frequencies = N;
% Create the histogram with data and frequencies
histogram('BinEdges', bin_edge, 'BinCounts', frequencies,'Normalization','pdf');
% Customize the plot (optional)
title('MIP histogram');
xlabel('Pore Diameter (mm)');
ylabel('Frequency');
grid on;
set(gca,"XScale","log")
  4 件のコメント
Star Strider
Star Strider 2023 年 10 月 3 日
Since the pore sizes cannot be negative, that would limit the distribution to continuous distributions with positive support. Fitting a lognormal distribution to it would likely be where to start, then see if other logical choices work better. (I do not know what processes govern pore sizes, however that could guide the correct distribution choice if known.)
Behrooz Daneshian
Behrooz Daneshian 2023 年 10 月 3 日
My problem is that the hieght of fitted lognormal does not match to that of real pdf. Indeed, I can generate 1e6 number of pores with my network model, but the real pdf is created based on 4.26 e15 number of pores.

サインインしてコメントする。

回答 (3 件)

the cyclist
the cyclist 2023 年 10 月 3 日
Fitting to the histogram of data, instead of to the raw data, is typically a bad modeling practice, because you introduce error during the binning process.
You could use the ksdensity function to generate a non-parametric curve that fits your data. (Be sure to use the option that limits statistical support to positive values.)
It might also be fine to just re-sample the data you have, to generate new data. (That will never generate unseen pore sizes, but maybe that error is not important.)
I frankly don't understand the utility of the method you describe, of
  • Fitting
  • Generating data from the fit
  • Seeing if the generated data also fits well. (It must, to within sampling error.)
Maybe there is something I'm not seeing.
  1 件のコメント
Behrooz Daneshian
Behrooz Daneshian 2023 年 10 月 3 日
The pore size within the soil naturally have a kind of distribution (as you see from the generated pore size distribution obtained from a type of test). Considering that, I want the the diameters of pors that I am going to generate in my model (Pore Netwrok Model) have the same distribution as those within the soil. I mean the distribution of pores' diameters in my model must be the same as those in reality.
It is assumed that pores within the soil are cylincrical with a diameter of D.

サインインしてコメントする。


Image Analyst
Image Analyst 2023 年 10 月 3 日
I agree with @Star Strider and @the cyclist -- if you can't use the actual distribution and must use a formula, you should use one that has theoretical justification. Like they said, there is a theoretical basis for using a log-normal distribution. I've seen it for countless measurements. It almost doesn't matter what I'm measuring (area, perimeter, circularity, or whatever) with particles, they all seem to have a log normal distribution. If you want a reference for the theory, see the bible on particle size measurements by Terence Allen of Dupont: "Particle Size Measurement"

Star Strider
Star Strider 2023 年 10 月 4 日
I would use the histfit function, then if the fit appears to be acceptable, use the fitdist funciton to estimate the parameters. As I mentioned previously start with the lognormal distribution and search for similar distributions with positive suport if the lognormal distribution does not provide an accurate fit.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by