Linear regression on training set

I have some data that I want to divide into a training set and a validation set in order to do linear regression on the training set to find y0 and r. The training set should contain at least 50% of the data. My code so far is that below:
A=[130, 300, 400, 500, 650, 1075, 2222, 2550, 3300]';
t = [1930, 1943, 1966, 1976, 1991, 1994, 2000, 2005, 2008];
idx=randperm(numel(A))
subSet1=A(idx(1:5)) %Trainingset
subSet2=A(idx(6:end)) %Validationset
If I can assume the function is exponential and is y(t)= y0*e^rt how do I continue to plot the training set to find y0 and r?
Thankful for all help!

9 件のコメント

J. Alex Lee
J. Alex Lee 2020 年 9 月 10 日
you already identified that your regression can be made into linear form, so that's already a big hint for you...
katara
katara 2020 年 9 月 10 日
Yeah so, I tried rewriting the function as log(y)=log(y0) + rt and then using polyfit(t, log(y),1) but since y0 is unknown that doesn't work.
katara
katara 2020 年 9 月 10 日
編集済み: katara 2020 年 9 月 10 日
I just realized I could just name a new variable y = log(y) and use polyfit from there. So my code is:
A=[130, 300, 400, 500, 650, 1075, 2222, 2550, 3300]';
t = [1930, 1943, 1966, 1976, 1991, 1994, 2000, 2005, 2008];
t1=[1930, 1943, 1966, 1976, 1991];
idx=randperm(numel(A));
subSet1=A(idx(1:5)); %Trainingset
subSet2=A(idx(6:end)); %Validationset
y=log(subSet1);
c=polyfit(t1,y, 1)
r=c(1);
lny0=c(2);
y0=exp(c(2));
y2 = y0*exp(r*t);
plot(t,y2,'*')
But now I have chosen that t1 is the first five years of t, which won't correspond correctly to the randomly chosen values of the training set. Is there a way of choosing five t values that will correspond to the randomly chosen values?
Johannes Hougaard
Johannes Hougaard 2020 年 9 月 10 日
the five t values that will correspond to the randomly chosen values are used by using the idx vector similarly to what you do for A.
A=[130, 300, 400, 500, 650, 1075, 2222, 2550, 3300]';
t = [1930, 1943, 1966, 1976, 1991, 1994, 2000, 2005, 2008];
idx=randperm(numel(A));
subSet1=A(idx(1:5)); %Trainingset
subSet2=A(idx(6:end)); %Validationset
t1 = t(idx(1:5)); %t values for Trainingset
y=log(subSet1);
c=polyfit(t1,y, 1)
r=c(1);
lny0=c(2);
y0=exp(c(2));
y2 = y0*exp(r*t);
plot(t,y2,'*')
And to apply your polyfit result you could just use polyval.
% Or you could use
y2 = exp(polyval(c,t));
plot(t,y2);
Adam Danz
Adam Danz 2020 年 9 月 10 日
編集済み: Adam Danz 2020 年 9 月 10 日
Johannes has the right approach (maybe it can be written as an answer). It can be generalized to any size dataset using
idx = randperm(numel(A));
nTrain = ceil(numel(A)/2);
% nTest = numel(A)-nTrain; % if needed
trainIdx = 1:nTrain;
testIdx = nTrain+1 : numel(A);
trainSet = [A(trainIdx); t(trainIdx)]; % assuming A and t are row vectors
testSet = [A(testIdx); t(testIdx)]; % same assumptionx
% Then proceed with fitting on the trainSet and measuring
% error on the testSet
Also note that if you're planning on using a more rigorous cross validation, use cvpartition to partition your data.
katara
katara 2020 年 9 月 10 日
Thank you!
One question to Johannes, how can I plot the polyfit using polyval. In other problems I have used for example:
c=polyfit(t, temp, 2)
x=polyval(c,t)
plot(t,temp,'*', t, x)
However, for this problem I tried:
y=log(subSet1);
c=polyfit(t1,y, 1)
p=polyval(c,t);
r=c(1);
lny0=(c(2));
y0=exp(c(2));
y2 = y0*exp(r*t);
plot(t,y2,'*',t,p)
And it didn't work. The code You wrote with polyval didn't work either.
The whole code is now:
A=[130, 300, 400, 500, 650, 1075, 2222, 2550, 3300]';
t = [1930, 1943, 1966, 1976, 1991, 1994, 2000, 2005, 2008];
idx=randperm(numel(A));
subSet1=A(idx(1:5)); %Trainingset
subSet2=A(idx(6:end)); %Validationset
t1=t(idx(1:5)); %t values for Trainingset
y=log(subSet1);
c=polyfit(t1,y, 1)
p=polyval(c,t);
r=c(1);
lny0=(c(2));
y0=exp(c(2));
y2 = y0*exp(r*t);
plot(t,y2,'*',t,p)
J. Alex Lee
J. Alex Lee 2020 年 9 月 10 日
you just need to exponentiate the result of polyval (remember you took the log), and I would wager the plot you really want is
plot(t,A,'*',t,exp(polyval(c,t)))
Or if I may:
A=[130, 300, 400, 500, 650, 1075, 2222, 2550, 3300];
t = [1930, 1943, 1966, 1976, 1991, 1994, 2000, 2005, 2008];
idx=randperm(numel(A));
subSet1=A(idx(1:5)); %Trainingset
subSet2=A(idx(6:end)); %Validationset
t1=t(idx(1:5)); %t values for Trainingset
t2=t(idx(6:end)); %t values for Trainingset
y=log(subSet1);
c=polyfit(t1,y, 1)
p=polyval(c,t);
r=c(1);
y0=exp(c(2));
yMdlFn = @(t)(y0*exp(r*t));
% to evaluate on test set
yMdlTest = yMdlFn(t2)
% more comprehensive plot
figure(1); cla; hold on
plot(t1,subSet1,'*')
plot(t2,subSet2,'o')
fplot(yMdlFn,[1929,2009])
But also recommend implement Adam's generalization to arbitrarily large data sets partitioned into arbitrarily sized training and test sets (although i think the code posted doesn't work)
Image Analyst
Image Analyst 2020 年 9 月 10 日
If you want a log fit, use fitnlm() rather than polyfit().
J. Alex Lee
J. Alex Lee 2020 年 9 月 10 日
i would take linear least squares anywhere i can get it, including this situation. linear fitting doesn't require initial guesses and guaranteed to give a "result", and is faster. now you could use the result of the polyfit to do a nonlinear fit, if you want to define the least squares differently. But you're still left with a choice on how to define your residual anyway, so you have a lot more things to worrry about if you care to that level with nonlinear fitting.

サインインしてコメントする。

回答 (1 件)

Johannes Hougaard
Johannes Hougaard 2020 年 9 月 11 日

1 投票

the five t values that will correspond to the randomly chosen values are used by using the idx vector similarly to what you do for A.
A=[130, 300, 400, 500, 650, 1075, 2222, 2550, 3300]';
t = [1930, 1943, 1966, 1976, 1991, 1994, 2000, 2005, 2008];
idx=randperm(numel(A));
subSet1=A(idx(1:5)); %Trainingset
subSet2=A(idx(6:end)); %Validationset
t1 = t(idx(1:5)); %t values for Trainingset
y=log(subSet1);
c=polyfit(t1,y, 1)
r=c(1);
lny0=c(2);
y0=exp(c(2));
y2 = y0*exp(r*t);
plot(t,y2,'*')
And to apply your polyfit result you could just use polyval.
% Or you could use
y2 = exp(polyval(c,t));
plot(t,y2);

質問済み:

2020 年 9 月 10 日

回答済み:

2020 年 9 月 11 日

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by