Simple linear regression, prediction intervals behave strangely using predint
7 ビュー (過去 30 日間)
古いコメントを表示
I'm trying to calculate simple linear regression (y = bx+a) with prediction interval from data set y where each data point has an uncertainty of u. The weight for each data point is calculated by taking the inverse of variance (1/u^2). When calculating the prediction interval using predint with the cfit object (ft) as input it produces unreasonably wide prediction intervals. What is the solution to produce reasonable and correct prediction intervals?
If w is a vector of ones ( w = ones(8,1) ) it gives the exact same result as omitting the 'Weights' input option in fit. I'm considering whether I should normalize u so w sums to 8 (the number of measurements) which seems to give reasonable results. This does not weight the data as inverse of variance which is common but normalizes it relative to the sum of vector u elements. See code.
Consider the following block of code.
% independent variable
x = [-40 -20 0 20 40 60 80 100]';
% dependent variable
y = [-39.8500 -19.7700 0.2200 20.2300 40.3000 60.1900 79.9400 99.4700]';
% measurement uncertainty of each result y
u = [0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08]';
% calculating weights w for each data point y from u by taking the inverse of variance
w = 1./u.^2; % this gives unreasonable wide intervals
% uncomment to see the results with my normalization
% w = u.*length(u)/sum(u);
figure(1)
hold on
plot(x,y,'rx')
ft = fit(x,y,'poly1','Weights',w);
plot(ft,'b-')
pft = predint(ft,x,0.95,'observation','on');
plot(x,pft,'r--')
1 件のコメント
dpb
2019 年 10 月 10 日
I'll have to do some digging, but looks like maybe using w=u.^2 is more nearly the result one would expect rather than the 1/U^2. Maybe TMW decided to try to help by not requiring the transform.
I've never actually used the fit object and related stuff yet "in anger" so would have to try to figure out what they did internally to do more than just observe what happens if don't use inverse.
回答 (1 件)
dpb
2019 年 10 月 10 日
It's the use of the ''observation','on' for the simultaneous bounds that's causing the much wider prediction interval bounds.
Not sure why there's as much difference as is between the two cases for the particular data; that does seem somewhat extreme. Maybe the limited number of observations is the culprit; unfortunately I don't have time to delve more deeply at the moment, sorry...
2 件のコメント
dpb
2019 年 10 月 11 日
It's the 'on' vs the 'off' for simultaneous limits that's causing the diffference in magnitude...did you do the comparison?
I've not had time to try to read the actual code but I'd presume it probably is implemented correctly...
Do you have another statistical prediction package to compare against or just write the results out directly???
参考
カテゴリ
Help Center および File Exchange で Get Started with Curve Fitting Toolbox についてさらに検索
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!