confidence intervals returned by predict()
古いコメントを表示
The predict() function returns confidence intervals (CIs) for values predicted from a model. There are four options available for the CIs. Two of the options do not give the CIs I expect. Can someone explain these unexpected results? Are my expectaitons wrong or is the function wrong? I will give examples, using a simple linear regression model, and I will explain what values I expect. I'm sorry this is a long post, but I did not have time to make it shorter.
Create some data and make a simple linear regression model:
x=(5:15)';
b0=0; b1=1; sigma=1; %b0=intercept, b1=slope, sigma=s.d. of random noise
y=b0+b1*x+sigma*randn(size(x));
mdl=fitlm(x,y); % model using x, y
Make predictions with confidence intervals (four options for CIs)
xnew=(0:20)';
[~,yci1] =predict(mdl,xnew,'Prediction','curve', 'Simultaneous',false);
[~,yci2] =predict(mdl,xnew,'Prediction','curve', 'Simultaneous',true);
[~,yci3] =predict(mdl,xnew,'Prediction','observation','Simultaneous',false);
[ypred,yci4]=predict(mdl,xnew,'Prediction','observation','Simultaneous',true);
Plot predictions and confidence intervals
figure
subplot(211)
plot(x,y,'k*',xnew,ypred,'-k.'); hold on
plot(xnew,yci1(:,1),'-r',xnew,yci2(:,1),'-g',xnew,yci3(:,1),'-b',xnew,yci4(:,1),'-m');
plot(xnew,yci1(:,2),'-r',xnew,yci2(:,2),'-g',xnew,yci3(:,2),'-b',xnew,yci4(:,2),'-m');
legend('Data','Prediction','curve,non-simul','curve,simul.','obs.,non-simul','obs.,simul.')
ylabel('Y'); grid on
subplot(212)
plot(xnew,yci1(:,2)-ypred,'-r',xnew,yci2(:,2)-ypred,'-g',...
xnew,yci3(:,2)-ypred,'-b',xnew,yci4(:,2)-ypred,'-m');
legend('curve,non-simul','curve,simul.','obs.,non-simul','obs.,simul.')
xlabel('X'); ylabel('C.I. Half-width'); grid on
I wish the Matlab help epxlained the following, which took me some work to figure out: The four different CIs returned by predict() follow the general formula
where SE varies depending on the 'Prediction' option, and c varies depending on the 'Simultaneous' option.
When predict() is called with 'Prediction','curve', SE is given by
where 
When predict() is called with 'Prediction','observation', SE is given by

When predict() is called with 'Simultaneous',false, c (for simple linear regression) is given by
where p is the CI probability, 0.95 by default. The critical value of the t statistic can be obtained in Matlab with c=tinv((1+p)/2,n-2). In the example here, p=0.95 and n=11, therefore c=tinv(.975,9)=2.2622. The formulas above produce CIs that agree with the CIs of predict(), when Simultaneous is false. These CIs are plotted in red and blue above.
When Simultaneous is true, the results are not what I expect. I expect the CIs (which, according to the Matlab Help, are by Scheffe's method) to be (see here and here; these sources use different notation, but they appear to agree):
where d is the number of independent new x values for simultaneous prediction. In the examples plotted above, d=21, because length(xnew)=21. Therefore we expect c=sqrt(21*finv(.95,21,9))=7.8391. Therefore we expect the CI widths to be wider by a uniform factor of 7.84/2.26=3.47, when Simultaneous is true. But the CIs are only wider by a factor of 1.2898. (The ratio of CI widths is the same when 'Prediction','observation' is used.) Why the discrepancy?
The confidence interval, when predicting a single value with 'Simultaneous',true , is also not what we expect. When predicting a single value, d=1, and c simplifies to
. , where p is the CI probability. This is identical to the non-simultaneous confidence interval,
, due to the relationship between F and t distributions. It makes sense that the simultaneous and non-simultaneous CIs would be the same when there is only one value being predicted "simultaneously". But the CIs returned by predict() are not the same, when one value is being predicted. See example below.
xnew=10;
[ypred1,yci1]=predict(mdl,xnew,'Prediction','curve','Simultaneous',false);
[ypred2,yci2]=predict(mdl,xnew,'Prediction','curve','Simultaneous',true);
fprintf('CI, non-simultaneous: %.2f to %.2f; half-width %.2f\n',yci1,yci1(2)-ypred1)
fprintf('CI, simultaneous: %.2f to %.2f; half-width %.2f\n',yci2,yci2(2)-ypred2)
Why are the CIs not the same?
採用された回答
その他の回答 (0 件)
カテゴリ
ヘルプ センター および File Exchange で Linear Predictive Coding についてさらに検索
製品
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!

