# Generating PDF and Overplotting from Subset of Data Using Gaussian Mixture Model

6 ビュー (過去 30 日間)
John 2022 年 12 月 1 日
コメント済み: John 2022 年 12 月 1 日
Assume there is a data set that contains n-measurments (In this case n=2) and that overlap. I wish to discriminate a subset of the total set by identifying the greater proportion and then applying a probability density function fit onto the distribution. I tried doing just that by applying the fitgmdist function onto the data set, knowing n=2 and then chose the higher "componentproportion" as the true subset I wish to keep the fit for. I thought what I could do was apply makedist using the mu and sigma from the distirbution I chose (TDist), then create a probability density function using pdf so I can overplot it on top of the histogram data.
clc; clear;
DataSize = 10000;
linLength = 1000;
Data1 = normrnd(-2,1,[1,DataSize]);
Data2 = normrnd(3,2,[1,DataSize]);
Data = [Data1,Data2]';
Dist = fitgmdist(Data,2); %Create fits for both distributions
DistNdex = find(Dist.ComponentProportion==max(Dist.ComponentProportion)); %Find distribuition with greater contribution
TDist.Mu = Dist.Mu(DistNdex); %Average
Unrecognized method, property, or field 'Mu' for class 'gmdistribution'.

Error in indexing (line 22)
[varargout{1:nargout}] = builtin('subsref',this,s);
TDist.Sigma = Dist.Sigma(DistNdex); %Std. Dev
PD = makedist('Normal','mu',TDist.Mu,'sigma',TDist.Sigma); %Create normal distribution using mu/sigma
xPD = linspace(TDist.Mu - 3*TDist.Sigma,TDist.Mu + 3*TDist.Sigma,linLength); %Create linspace that spans 3 sigma
pdfValues = pdf('Normal',xPD,TDist.Mu,TDist.Sigma); %Create non-normalized pdf over defined linspace
NormPdfValues = normpdf(xPD,TDist.Mu,TDist.Sigma); %Create normalized pdf over defined linspace
%Plotting
figure
histogram(Data)
hold on
plot(xPD,pdfValues,'r','LineWidth',5)
plot(xPD,NormPdfValues,'g','LineWidth',5)
hold off
But my issue here is that the max y-value for the fit is incredibly small wrt the data (Regardless of whether it is normalized or not). Why is this and how do I specify what the max y-value should be for it's associated component distribution? I'm thinking I'm lacking a piece of stats knowledge here rather than having a Matlab issue.
PS - I know this code gives an error when I ran it within the browser. Not sure why it doesn't recognize the substructure "Mu" but it works just fine and runs on my local without issues.
##### 1 件のコメント表示非表示 なし
Chris 2022 年 12 月 1 日
mu should be lower-case. Perhaps that is release-specific...

サインインしてコメントする。

### 採用された回答

Chris 2022 年 12 月 1 日

I believe the area under the curve of these PDFs is 1.
One way to work around that (though probably not the most correct way) would be to scale the pdf by its max value, to the maximum of the histogram.
DataSize = 10000;
linLength = 1000;
Data1 = normrnd(-2,1,[1,DataSize]);
Data2 = normrnd(3,2,[1,DataSize]);
Data = [Data1,Data2]';
Dist = fitgmdist(Data,2); %Create fits for both distributions
DistNdex = find(Dist.ComponentProportion==max(Dist.ComponentProportion)); %Find distribuition with greater contribution
TDist.Mu = Dist.mu(DistNdex); %Average
TDist.Sigma = Dist.Sigma(DistNdex); %Std. Dev
PD = makedist('Normal','mu',TDist.Mu,'sigma',TDist.Sigma); %Create normal distribution using mu/sigma
xPD = linspace(TDist.Mu - 3*TDist.Sigma,TDist.Mu + 3*TDist.Sigma,linLength); %Create linspace that spans 3 sigma
pdfValues = pdf('Normal',xPD,TDist.Mu,TDist.Sigma); %Create non-normalized pdf over defined linspace
% NormPdfValues = normpdf(xPD,TDist.Mu,TDist.Sigma); %Create normalized pdf over defined linspace
%Plotting
figure
h = histogram(Data);
hold on
mult = max(h.BinCounts)/max(pdfValues);
plot(xPD, mult*pdfValues,'k--','LineWidth',2) ##### 2 件のコメント表示非表示 1 件の古いコメント
John 2022 年 12 月 1 日
Nevermind, the scaling does not matter and is left up to the user because regardless, none of them are normalized curves and can be normalized later on. The reason I was initially confused was because when one runs the histfit() function, the overplotted fit Matlab produces is also not normalized but I assumed there was some, consistent statistical method I was missing. I guess whoever programmed the scaling into the function just decided it based on their own experience? Matlab's output: (Ignore the fact that it's applying a fit to the total data. This is just exemplifying the scaling) Close enough! One can at least see the fit now.

サインインしてコメントする。

### カテゴリ

Find more on Half-Normal Distribution in Help Center and File Exchange

R2020b

### Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!