The calculated R squared is not equal to the squared of correlation coefficient by Matlab functions corr

49 ビュー (過去 30 日間)
With model predicitons and true values, the R2 (determiantion coefficient) can be readily calculated using the standard formula:
Rsq = 1 - sum((ytrue - ypred).^2)/sum((ytrue - mean(ytrue)).^2)
Alternativley, the R square can be obtained by calculating the correlation coefficient, using buildin functions such as corr or corrcoeff:
Rsq = (corr(ytrue,ypred))^2
However, it is found the latter value is sligherly larger than the former. How does the build-in function give a higher value?
  3 件のコメント
Yuzhen Lu
Yuzhen Lu 2020 年 4 月 23 日
I attch my data files for your double checking.
dpb
dpb 2020 年 4 月 24 日
Altho they're not the sme calculation

サインインしてコメントする。

回答 (2 件)

Ameer Hamza
Ameer Hamza 2020 年 4 月 23 日
You are trying to find the coefficient of determination(R-squared). Whereas, as shown in the documentation of corr(): https://www.mathworks.com/help/releases/R2020a/stats/corr.html#d120e195813 it calculates Pearson's linear correlation coefficient. I am not sure if any MATLAB's built-in function supports its direct calculation, however, I found this submission on FEX: https://www.mathworks.com/matlabcentral/fileexchange/34492-r-square-the-coefficient-of-determination. Internally, it implements the same formula as you are using right now.

John D'Errico
John D'Errico 2020 年 4 月 24 日
編集済み: John D'Errico 2020 年 4 月 24 日
What I do not see is the actual model you used. Did you use a linear model? Was there a constant term in the model? The problem is, depending on the model, the claims you make about R^2 and the correlation coefficient are only valid for specific models.
x = rand(10,1);
>> y = rand(10,1);
>> p2 = polyfit(x,y,2);
>> pred = polyval(p2,x);
>> Rsq = 1 - sum((y - pred).^2)/sum((y - mean(y)).^2)
Rsq =
0.140274350649466
>> corr(y,pred).^2
ans =
0.140274350649466
So, the square of the correlation coefficient is the same as the value your formula computes. It matches down to the last digit, which is my expectation.
However, now try the same thing, but using a model that has no constant term in it. In this case, I'll use a cubic polynomial fit, but one that has no constant term. We can do that using backslash, though I could have done the fit using any number of tools.
mdl = [x,x.^2,x.^3]\y
mdl =
0.552026949387604
3.2235169295382
-3.50451900695301
>> pred = [x,x.^2,x.^3]*mdl;
>> Rsq = 1 - sum((y - pred).^2)/sum((y - mean(y)).^2)
Rsq =
0.195980323024559
>> corr(y,pred).^2
ans =
0.200698709640219
What was wrong? The error is in the assumption that the two ways compute the same thing for models that have no constant term estimated.
There are adjusted R^2 computations that can be more accurate in these cases, but even so, there is no expectation the formulas will give the same result any longer, when the model lacks a constant term.

カテゴリ

Help Center および File ExchangeSmoothing and Denoising についてさらに検索

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by