Calculate the distance between a linear regression line and the data points

102 ビュー (過去 30 日間)
ZimtBolten
ZimtBolten 2020 年 8 月 6 日
編集済み: John D'Errico 2020 年 8 月 6 日
Hello,
I would like to calculate the distance between a linear regression line and different data points.
This is my figure:
And this is my code for the linear regression:
x = age_a;
y = par_a;
m = age_b;
n = par_b;
s = polyfit(x,y,1);
f = polyval(s,x);
plot(x,y,'ob',x,f,'-',m,n,'og')
Is that correct so far?
Two questions/problems:
  1. How can I calculate the distance between the linear regression line and different data points?
  2. the regression line x-intervall should be big enough to calculate the distance of both data-groups (a and b) and not just of group a. (There are some green dots to the right and left of the line)
Thank you for the help!!!

採用された回答

Alan Stevens
Alan Stevens 2020 年 8 月 6 日
If you have the x-values of all the data points of interest you can calculate the corresponding y-values of your line simply using, say
y = polyval(s,x);
where the x now represents the value of a green or blue point as required.
Then calculate
abs(y - ytrue)
for each point and sum the result over all the points. (ytrue is the actual y value of the point in question; y is the linear curve value).
Alternatively if you want a least squares distance you could use
sqrt((y-ytrue)^2 )
instead of the absolute value expression.
  5 件のコメント
ZimtBolten
ZimtBolten 2020 年 8 月 6 日
Thank you!
John D'Errico
John D'Errico 2020 年 8 月 6 日
編集済み: John D'Errico 2020 年 8 月 6 日
So, are you actually looking to compute a VERTICAL distance, as opposed to an orthogonal distance? Your original question seemed to imply the true distance to the line.
The orthogonal distance is the length of the perpendicular projection to the line. That is not what Alan has computed.

サインインしてコメントする。

その他の回答 (1 件)

John D'Errico
John D'Errico 2020 年 8 月 6 日
編集済み: John D'Errico 2020 年 8 月 6 日
I am confident that what you are asking is an orthogonal distance, not the vertical difference, frequently known as the residual error. Assuming you really are asking to compute the true distance to the line, thus the perpendiculr distance, instead of the residual in y only, you should understand he orthogonal distance is NOT what is minimized by a linear regression.
A linear regression minimizes the error in y only, thus it effectively computes the predicted value of y, at that value of x. That is what Alan has shown how to do. You have not attached your data, so I cannot give an example based on that data.
So let me show you the difference, as well as show how to compute what you want to see.
>> x = randn(20,1);
>> y = 1*x + 3 + randn(size(x));
% fit a polynomial, using polyfit. This is NOT an orthogonal regression to this data.
>> P1 = polyfit(x,y,1)
P1 =
0.8478 2.9813
Due to the noise in the data, the slope and intercept found are not exactly 1 and 3, but they are reasonably close.
Being too lazy to do the algebra right now, you may want to read here. This is not difficult. I'd probably compute the square of the distance, then find the minimum, differentiating and then solving. If I am feeling slightly less lazy at the end, I'll probably do it then.
>> D = abs(P1(1)*x - y + P1(2))./sqrt(P1(1)^2 + 1));
We can also compute the vertical differences from the line, as:
>> resids = y - (P1(1)*x + P1(2));
>> norm(D)
ans =
3.7368
>> norm(resids)
ans =
4.899
There D and resids are the VECTORS of distances, as well as residuals. All computations were done in a vectorized manner, without loops. Note that the expectation is the orthogonal distances will be somewhat less that the vertical distances. This was the case. The actual difference would depend on the slope of the line we found.
Now we can plot the data, as well as the lines connecting the points both vertically, and orthogonally.
a = P1(1);
c = P1(2);
b = -1;
xorth = (b*(b*x - a*y) - a*c)/(a^2 + b^2);
yorth = (a*(-b*x + a*y) - b*c)/(a^2 + b^2);
xrange = [min([x;xorth]),max([x;xorth])];
plot(x,y,'kx')
hold on
plot(xrange,polyval(P1,xrange),'-g')
plot(repmat(x(:),[1 2])',[y(:),polyval(P1,x(:))]','r-')
axis equal
xlabel x
ylabel y
title 'Vertical distances (residuals)'
legend('data','LINEAR regression line','Residuals')
figure
plot(x,y,'kx')
hold on
plot(xrange,polyval(P1,xrange),'-g')
plot([x(:),xorth(:)]',[y(:),yorth(:)]','b')
axis equal
xlabel x
ylabel y
title 'Orthogonal projections to the line'
legend('data','LINEAR regression line','Projections')
Be careful, in that if you do not use axis equal, then the orthogonal projection lines will not appear as if they are perpendicular to the line.
I believe these distances (those in the vector D) are what you were looking to find.

カテゴリ

Help Center および File ExchangeDescriptive Statistics についてさらに検索

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by