How can I scale CDF normal distribution values to match actual data? Calculating R^2?
1 回表示 (過去 30 日間)
古いコメントを表示
Hi everyone, How can I calculate R^2 for the actual data and the normal fit distribution? The problem I am having is my normal fit cdf values are on a scale of 0 to 1, and I would like to scale this so that is matches the scale of the actual data (0 to 2310). Because in the third to last step I must find the difference between the actual and normal predicted data.
Table = readtable("practice3.xlsx");
actual_values = Table.values;
actual_values = sort(actual_values)
hold on
cdfplot(actual_values); % Plot the empirical CDF
normalfit = fitdist(actual_values,'Normal'); % fit the normal distribution to the data
cdf_normal = cdf('Normal', actual_values, normalfit.mu, normalfit.sigma); % generate CDF values for each of the fitted distributions
plot(actual_values,cdf_normal) % plot the normal distribution
hold off
grid on
predicted_values = cdf_normal %HERE IS THE PROBLEM: cdf_normal ranges from 0 to 1, how can I scale cdf_normal to match the scale of the actual data, which has a max of 2310?
% Compute R^2, which is 1 - (sum of squared residuals/total sum of squares)
SSR = sum(predicted_values - actual_values).^2;
TSS = sum(((actual_values - mean(actual_values)).^2));
Rsquared = 1 - SSR/TSS % Results in incorrect R value (R should be less than 1)
0 件のコメント
回答 (1 件)
Oguz Kaan Hancioglu
2023 年 2 月 15 日
I think there is a problem in your calculation. Your calculation uses the x value of the actual values and F(x) value of the predicted values.
cdfplot(actual_values); % Plot the empirical CDF
cdfplot empirical CDF using your x-axis values. If you use the handle of the cdfplot you can access the F(x) value of your data. Change this as,
[h,stats] = cdfplot(actual_values); % Plot the empirical CDF
% don't close the cdfplot to use its handle
Fx = h.YData;
After you can use this Fx value in your your calculation.
% Compute R^2, which is 1 - (sum of squared residuals/total sum of squares)
SSR = sum(predicted_values - Fx).^2;
TSS = sum(((Fx - mean(Fx)).^2));
Rsquared = 1 - SSR/TSS % Results in incorrect R value (R should be less than 1)
2 件のコメント
Oguz Kaan Hancioglu
2023 年 2 月 15 日
That's caused by the cdfplot function. When you enter the actual_values into this function the cdfplot modifies the values of the actual_values and generates XData. You can examine h.Xdata. You will see that cdfplot writes the same element twice and adds -inf and +inf to your actual_values.
You can get your values by manual indexing.
Fxx = Fx(2:2:20);
The vectors are the same length and correspond to the actual_values. Now you can calculate the R^2 as follow.
Fxx = Fx(2:2:20);
% Compute R^2, which is 1 - (sum of squared residuals/total sum of squares)
SSR = sum(predicted_values - Fxx).^2;
TSS = sum(((Fxx - mean(Fxx)).^2));
Rsquared = 1 - SSR/TSS % Results in incorrect R value (R should be less than 1)
I calculated 0.9450. It worked. However I don't know any idea why cdfplot use the same element twice.
Best regard
参考
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!