# Calculating mean squared error or maybe MISE

5 ビュー (過去 30 日間)
Neuropragmatist 2019 年 7 月 25 日
Commented: Neuropragmatist 2019 年 8 月 8 日
Hi all,
I'm interested in comparing different bivariate histograms to an underlying 2D probability density function.
Additional info that you can skip for time:
My aim is to try and find the optimal bin size and smoothing for the histogram that best represents the known density function. In my field this is a common problem that doesn't really have a clear solution - there are many ways to estimate optimal bin size but I can't find any that also take smoothing into account, furthermore the histogram I want to compare is actually calculated as the ratio of 2 histograms generated with the same parameters but over very different underlying distributions. I have also not found any method for optimising parameters in such a situation. My ultimate aim is to generate histograms using a variety of different approaches and smoothing to try and find the 'best' or at least the best for different scenarios.
My first approach was to generate the histogram and then correlate the result with the PDF sampled at the same points (i.e. the histogram bin centers). Reading the literature a bit more I think I want to use the mean squared error (MSE) instead, but I'm not sure if this is a) appropriate or b) meaningful. Also, the Wikipedia page for MSE lists two equations and I'm not sure which is appropriate in this situation. I'm also worried that I should be calcualting the mean integrated squared error (MISE) instead, but I don't know how to do that for a discrete histogram vs a continuous PDF both of which are 2D. I have Matlab 2018b and all the toolboxes.
Here is the code I have so far:
% generate distribution of points, make histogram of these and get actual PDF underlying this
mu = [100 100];
sigma = [60 50;50 80];
num = 100;
pos1 = mvnrnd(mu,sigma,num); % the points
% in this example we will just have one distribution, but in the real data there are multiple such distributions all summed together
% which makes fitting a continuous function to the real data nearly impossible
bcx = 0:5:200;
bcy = 0:5:200;
[x,y] = meshgrid(bcx,bcy); % the grid over which to generate histogram or evaluate PDF
bcents = [x(:) y(:)];
map1 = mvnpdf(bcents,mu,sigma); % the PDF
map1 = reshape(map1,size(x));
map2 = hist3(pos1,'Ctrs',{bcx(:) bcy(:)}); % the histogram
% plot all three
figure
subplot(1,3,1)
plot(pos1(:,1),pos1(:,2),'ko')
axis([0 200 0 200])
axis square xy
title('Points')
subplot(1,3,2)
imagesc(map1)
axis square xy
title('PDF')
subplot(1,3,3)
imagesc(map2)
axis square xy
title('Histogram')
% calculate MSE
map_pdf = map_pdf .* 25; % scale so sum is unity (i.e probability - multiply by bin area to approximate Riemann sum)
map_hist = map_hist./sum(map_hist(:)); % scale so sum is unity (i.e probability)
mse = sum((map_pdf(:)-map_hist(:)).^2) .* (1/numel(map_pdf))
cor = corr(map_pdf(:),map_hist(:),'rows','pairwise')

#### 0 件のコメント

サインイン to comment.

### 件の回答 (1)

Ganesh Regoti 2019 年 8 月 8 日

#### 1 件のコメント

Neuropragmatist 2019 年 8 月 8 日
I don't think that's really relevant, I already have a PDF generated by mvnpdf and I have a histogram generated by histcounts2, the question is about how to compare the two distributions.

サインイン to comment.

サインイン してこの質問に回答します。

R2018b