Simple linear regression, prediction intervals behave strangely using predint

7 ビュー (過去 30 日間)
Antti Harala
Antti Harala 2019 年 10 月 9 日
コメント済み: dpb 2019 年 10 月 11 日
I'm trying to calculate simple linear regression (y = bx+a) with prediction interval from data set y where each data point has an uncertainty of u. The weight for each data point is calculated by taking the inverse of variance (1/u^2). When calculating the prediction interval using predint with the cfit object (ft) as input it produces unreasonably wide prediction intervals. What is the solution to produce reasonable and correct prediction intervals?
If w is a vector of ones ( w = ones(8,1) ) it gives the exact same result as omitting the 'Weights' input option in fit. I'm considering whether I should normalize u so w sums to 8 (the number of measurements) which seems to give reasonable results. This does not weight the data as inverse of variance which is common but normalizes it relative to the sum of vector u elements. See code.
Consider the following block of code.
% independent variable
x = [-40 -20 0 20 40 60 80 100]';
% dependent variable
y = [-39.8500 -19.7700 0.2200 20.2300 40.3000 60.1900 79.9400 99.4700]';
% measurement uncertainty of each result y
u = [0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08]';
% calculating weights w for each data point y from u by taking the inverse of variance
w = 1./u.^2; % this gives unreasonable wide intervals
% uncomment to see the results with my normalization
% w = u.*length(u)/sum(u);
figure(1)
hold on
plot(x,y,'rx')
ft = fit(x,y,'poly1','Weights',w);
plot(ft,'b-')
pft = predint(ft,x,0.95,'observation','on');
plot(x,pft,'r--')
  1 件のコメント
dpb
dpb 2019 年 10 月 10 日
I'll have to do some digging, but looks like maybe using w=u.^2 is more nearly the result one would expect rather than the 1/U^2. Maybe TMW decided to try to help by not requiring the transform.
I've never actually used the fit object and related stuff yet "in anger" so would have to try to figure out what they did internally to do more than just observe what happens if don't use inverse.

サインインしてコメントする。

回答 (1 件)

dpb
dpb 2019 年 10 月 10 日
It's the use of the ''observation','on' for the simultaneous bounds that's causing the much wider prediction interval bounds.
Not sure why there's as much difference as is between the two cases for the particular data; that does seem somewhat extreme. Maybe the limited number of observations is the culprit; unfortunately I don't have time to delve more deeply at the moment, sorry...
  2 件のコメント
Antti Harala
Antti Harala 2019 年 10 月 11 日
The 'observation' / 'functional' modifies whether or not you want to predict new observation or new fitted curve. The point is to predict new observation in this case, I do not want to predict new fit. If you consider a situation where all data points in y have same measurement uncertainty and therefore identical weights (omitting w from fit function or having weight w = ones(8,1)) this gives "reasonable" prediction bounds when using the 'observation' input option for predint. This is also the case with my true dataset which has 10 observations in each setpoint of independent variable.
dpb
dpb 2019 年 10 月 11 日
It's the 'on' vs the 'off' for simultaneous limits that's causing the diffference in magnitude...did you do the comparison?
I've not had time to try to read the actual code but I'd presume it probably is implemented correctly...
Do you have another statistical prediction package to compare against or just write the results out directly???

サインインしてコメントする。

カテゴリ

Help Center および File ExchangeGet Started with Curve Fitting Toolbox についてさらに検索

製品


リリース

R2019b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by