Function approximation: Neural network great 'on paper' but when simulated results are very bad?
3 ビュー (過去 30 日間)
古いコメントを表示
I need some help with NN because I don't understand what happened. One hidden layer, I=4, H=1:20, O=1. I run each net architecture 10 times with different initial weights (left default initnw). I have in total 34 datasets which were divided 60/20/20 when using Levenberg-Marquadt algorithm. Mse_goal = 0.01*mean(var(t',1)), i calculate NMSE and R^2, choose best R^2, for that check performance of each subsample, check regression plots, check rmse. R^2 is usually around 0,95; R for each subset 0,98... But when I simulate network with completely new set of data, estimations deviate quite a lot. It is not because of extrapolation. Data are normalized with mapminmax, transfer functions tansig, purelin.
Trainbr was my first choice actually, since I have small dataset and trainbr doesn't need validation set (Matlab2015a), but it is awfully slow. I ran a net with trainbr and we are talking hours versus minutes with trainlm.
I've read a ton of Greg Heath's posts and tutorials and found very valuable information there, however, still nothing. I see no way out.
% Solve an Input-Output Fitting problem with a Neural Network
% Script generated by Neural Fitting app
% Created 09-Aug-2016 18:33:13
% This script assumes these variables are defined:
%
% MP_UA_K - input data.
% UA_K - target data.
close all, clear all
load varUA_K
x = MP_UA_K;
t = UA_K;
var_t=mean(var(t',1)); %t variance
[inputs,obs]=size(x); %
hiddenLayerSize = 20; %max number of neurons
numNN = 10; % number of training runs
neurons = [1:hiddenLayerSize]';
training_no = 1:numNN;
obs_no = 1:obs;
nets = cell(hiddenLayerSize,numNN);
trainOutputs = cell(hiddenLayerSize,numNN);
valOutputs = cell(hiddenLayerSize,numNN);
testOutputs = cell(hiddenLayerSize,numNN);
Y_all = cell(hiddenLayerSize,numNN);
performance = zeros(hiddenLayerSize,numNN);
trainPerformance = zeros(hiddenLayerSize,numNN);
valPerformance = zeros(hiddenLayerSize,numNN);
testPerformance = zeros(hiddenLayerSize,numNN);
e = zeros(numNN,obs);
e_all = cell(hiddenLayerSize,numNN);
NMSE = zeros(hiddenLayerSize,numNN);
r_train = zeros(hiddenLayerSize,numNN);
r_val = zeros(hiddenLayerSize,numNN);
r_test = zeros(hiddenLayerSize,numNN);
r = zeros(hiddenLayerSize,numNN);
Rsq = zeros(hiddenLayerSize,numNN);
for j=1:hiddenLayerSize
% Choose a Training Function
% For a list of all training functions type: help nntrain
% 'trainlm' is usually fastest.
% 'trainbr' takes longer but may be better for challenging problems.
% 'trainscg' uses less memory. Suitable in low memory situations.
trainFcn = 'trainbr'; % Bayesian Regularization backpropagation.
% Create a Fitting Network
net = fitnet(j,trainFcn);
% Choose Input and Output Pre/Post-Processing Functions
% For a list of all processing functions type: help nnprocess
net.input.processFcns = {'removeconstantrows','mapminmax'};
net.output.processFcns = {'removeconstantrows','mapminmax'};
% Setup Division of Data for Training, Validation, Testing
% For a list of all data division functions type: help nndivide
% podaci su sortirani prema zavisnoj varijabli, cca svaki treći dataset je
% testni
net.divideFcn = 'divideind'; % Divide data by index
net.divideMode = 'sample'; % Divide up every sample
net.divideParam.trainInd = [1:3:34,2:3:34];
% net.divideParam.valInd = [5:5:30];
net.divideParam.testInd = [3:3:34];
mse_goal = 0.01*var_t;
% Choose a Performance Function
% For a list of all performance functions type: help nnperformance
net.performFcn = 'mse'; % Mean Squared Error
net.trainParam.goal = mse_goal;
% Choose Plot Functions
% For a list of all plot functions type: help nnplot
net.plotFcns = {'plotperform','plottrainstate','ploterrhist', ...
'plotregression', 'plotfit'};
for i=1:numNN
% Train the Network
net = configure(net,x,t);
disp(['No. of hidden nodes ' num2str(j) ', Training ' num2str(i) '/' num2str(numNN)])
[nets{j,i}, tr{j,i}] = train(net,x,t);
y = nets{j,i}(x);
e (i,:) = gsubtract(t,y);
e_all{j,i}= e(i,:);
trainTargets = t .* tr{j,i}.trainMask{1};
%valTargets = t .* tr{j,i}.valMask{1};
testTargets = t .* tr{j,i}.testMask{1};
trainPerformance(j,i) = perform(net,trainTargets,y);
%valPerformance(j,i) = perform(net,valTargets,y);
testPerformance(j,i) = perform(net,testTargets,y);
performance(j,i)= perform(net,t,y);
rmse_train(j,i)=sqrt(trainPerformance(j,i));
%rmse_val(j,i)=sqrt(valPerformance(j,i));
rmse_test(j,i)=sqrt(testPerformance(j,i));
rmse(j,i)=sqrt(performance(j,i));
% outputs of all networks
Y_all{j,i}= y;
trainOutputs {j,i} = y .* tr{j,i}.trainMask{1};
%valOutputs {j,i} = y .* tr{j,i}.valMask{1};
testOutputs {j,i} = y .* tr{j,i}.testMask{1};
[r(j,i)] = regression(t,y);
[r_train(j,i)] = regression(trainTargets,trainOutputs{j,i});
%[r_val(j,i)] = regression(valTargets,valOutputs{j,i});
[r_test(j,i)] = regression(testTargets,testOutputs{j,i});
NMSE(j,i) = mse(e_all{j,i})/mean(var(t',1)); % normalized mse
% coefficient of determination
Rsq(j,i) = 1-NMSE(j,i);
end
[minperf_train,I_train] = min(trainPerformance',[],1);
minperf_train = minperf_train';
I_train = I_train';
% [minperf_val,I_valid] = min(valPerformance',[],1);
% minperf_val = minperf_val';
% I_valid = I_valid';
[minperf_test,I_test] = min(testPerformance',[],1);
minperf_test = minperf_test';
I_test = I_test';
[minperf,I_perf] = min(performance',[],1);
minperf = minperf';
I_perf = I_perf';
[maxRsq,I_Rsq] = max(Rsq',[],1);
maxRsq = maxRsq';
I_Rsq = I_Rsq';
[train_min,train_min_I] = min(minperf_train,[],1);
% [val_min,val_min_I] = min(minperf_val,[],1);
[test_min,test_min_I] = min(minperf_test,[],1);
[perf_min,perf_min_I] = min(minperf,[],1);
[Rsq_max,Rsq_max_I] = max(maxRsq,[],1);
end
figure(4)
hold on
xlabel('observation no.')
ylabel('targets')
scatter(obs_no,trainTargets,'b')
% scatter(obs_no,valTargets,'g')
scatter(obs_no,testTargets,'r')
hold off
figure(5)
hold on
xlabel('neurons')
ylabel('min. performance')
plot(neurons,minperf_train,'b',neurons,minperf_test,'r',neurons,minperf,'k')
hold off
figure(6)
hold on
xlabel('neurons')
ylabel('max Rsq')
scatter(neurons,maxRsq,'k')
hold off
% View the Network
%view(net)
% Plots
% Uncomment these lines to enable various plots.
%figure, plotperform(tr)
%figure, plottrainstate(tr)
%figure, ploterrhist(e)
%figure, plotregression(t,y)
%figure, plotfit(net,x,t)
% Deployment
% Change the (false) values to (true) to enable the following code blocks.
% See the help for each generation function for more information.
save figure(4).fig
save figure(5).fig
save figure(6).fig
if (false)
% Generate MATLAB function for neural network for application
% deployment in MATLAB scripts or with MATLAB Compiler and Builder
% tools, or simply to examine the calculations your trained neural
% network performs.
genFunction(net,'nn_UA_K_BR');
y = nn_UA_K_BR(x);
end
% sačuvati sve varijable iz workspacea u poseban file za daljnju analizu
save ws_UA_K_BR
0 件のコメント
採用された回答
Greg Heath
2016 年 9 月 3 日
編集済み: Greg Heath
2016 年 9 月 5 日
% I need some help with NN because I don't understand what happened. One % hidden layer, I=4, H=1:20, O=1. I run each net architecture 10 times % with different initial weights (left default initnw). I have in total % 34 datasets
Do you mean data points N = 34?
It typically takes ~ 10 to 30 data points per dimension to
adequately characterize a distribution. For a 4-D distribution I'd recommend
40 <~ Ntrn <~ 120
% which were divided 60/20/20 when using Levenberg-Marquadt
Ntrn = 34-2*round(0.2*34) = 20
Hub = (20-1)/(4+1+1) = 3.2
indicating you really don't have enough data to adequately characterize a 4-D distribution.
You should consider
1. Dimensionality reduction
2. k-fold crossvalidation
3. Adding new data with the same mean and covariance (stdv +
correlations) matrix
% algorithm. Mse_goal = 0.01*mean(var(t',1)), i calculate NMSE and R^2, % choose best R^2, for that check performance of each subsample, check % regression plots, check rmse. R^2 is usually around 0,95; R for each % subset 0,98... But when I simulate network with completely new set of % data, estimations deviate quite a lot. It is not because of % extrapolation.
No. It probably is. Your training data subset is insufficiently
large for 4 dimensions.
I would begin with minimizing H with dividetrain. Then consider
k-fold crossvalidation.
% Data are normalized with mapminmax, transfer functions tansig, % purelin. % Trainbr was my first choice actually, since I have small dataset and % trainbr doesn't need validation set (Matlab2015a), but it is awfully % slow. I ran a net with trainbr and we are talking hours versus minutes % with trainlm.
This may be a BUG. Let MATLAB know. What version are you using?
>> ver
% I've read a ton of Greg Heath's posts and tutorials and found very % valuable information there, however, still nothing. I see no way out.
It typically takes ~ 10 to 30 data points per dimension to adequately
characterize a distribution,
I suggest calculating the means and stdv for each data set to see how
much your training data is representative of the total 4-D
distribution that includes the new datasets. 2 or 3-D
color coded projections may be helpful.
Hope this helps.
Greg
12 件のコメント
Greg Heath
2016 年 9 月 28 日
編集済み: Greg Heath
2016 年 9 月 28 日
Please post your data in *.m or *.txt.
NEVERMIND! SEE BELOW.
その他の回答 (1 件)
Greg Heath
2016 年 9 月 28 日
AN OPTIMISTIC ESTIMATE USING DIVIDETRAIN:
% Solve an Input-Output Fitting problem with a Neural Network
% Script generated by Neural Fitting app
% Created 09-Aug-2016 18:33:13
% This script assumes these variables are defined:
%
% MP_UA_K - input data.
% UA_K - target data.
close all, clear all, clc, plt=0, tic
format short e
load varUA_K
whos
% Name Size Bytes Class
% MP_UA_K 3x34 816 double
% UA_K 1x34 272 double
% plt 1x1 8 double
x = MP_UA_K; t = UA_K;
[I N ] = size(x), [O N ] = size(t)% [ 3 34 ], [ 1 34]
vart1 = mean(var(t',1)) % 1.0259e+05
xt = [x;t]; minmaxxt = minmax(xt)
% minmaxxt = 2.0700e+02 7.6000e+02
% 3.5900e+02 1.0180e+03
% 1.5100e-02 2.8500e-01 % 10^4 LOWER!!!
% 8.1300e+02 2.4070e+03
x1 = x(1,:); x2 = x(2,:); x3=x(3,:);
plt = plt+1, figure(plt)
subplot(2,2,1), plot(x1,'k','LineWidth',2)
subplot(2,2,2), plot(x2,'b','LineWidth',2)
subplot(2,2,3), plot(x3,'g','LineWidth',2)
subplot(2,2,4), plot( t,'k','LineWidth',2)
GEH1 = 'DOES NOT LOOK PROMISING!!!'
Ntrneq = N*O % DIVIDETRAIN
Hub = (Ntrneq-O)/(I+O+1) % 6.6
Hmin = 0, dH = 1, Hmax = 10
Ntrials = 10
rng(0)
j=0
for h = 0:10
j=j+1
if h==0
net = fitnet([]);
Nw = (I+1)*O
else
net = fitnet(h);
Nw = (I+1)*h+(h+1)*O
end
Ndof = Ntrneq-Nw
MSEgoal = 0.01*max(Ndof,0)*vart1/Ntrneq
net.divideFcn = 'dividetrain';
net.trainParam.goal = MSEgoal;
net.trainParam.min_grad = MSEgoal/100;
for i = 1:Ntrials
i = i
net = configure(net,x,t);
[net tr y e ] = train(net,x,t);
NMSE(i,j) = 100*mse(e)/vart1;
end
end
NMSE = NMSE
minNMSE = min(NMSE)
medNMSE = median(NMSE)
meanNMSE = mean(NMSE)
maxNMSE = max(NMSE)
totaltime = toc % 96 sec
% NONOVERFITTING 0 <= H <= 6 < Hub = 6.6
H 0 1 2 3 4 5 6
minNMSE = 48.3 33.3 19.4 10.7 8.7 7.2 6.6
medNMSE = 48.3 33.3 24.5 17.0 10.8 8.1 7.4
meanNMSE = 48.3 40.0 33.4 16.7 12.1 8.3 7.5
maxNMSE = 48.3 100.0 76.7 26.6 22.3 11.2 8.4
GEH2 = 'With H = 6 can get Rsquare = 93.4 !'
% OVERFITTING Hub = 6.6 < 7 <= H <= 10
H 7 8 9 10
minNMSE = 5.97 5.96 5.96 5.96
medNMSE = 6.22 5.96 5.96 5.96
meanNMSE = 6.47 6.02 6.02 5.96
maxNMSE = 8.16 6.42 6.53 5.96
GEH3 = 'With OVERFITTING can only get 94.0 !'
% NMSE = NMSE
% Columns 1 through 6
%
% 4.8282e+01 3.3313e+01 2.6913e+01 1.9122e+01 9.3848e+00 1.1225e+01
% 4.8282e+01 3.3313e+01 2.2170e+01 1.0726e+01 1.0602e+01 8.7863e+00
% 4.8282e+01 3.3313e+01 2.1539e+01 1.5017e+01 1.3730e+01 7.8872e+00
% 4.8282e+01 3.3313e+01 2.0225e+01 1.5821e+01 1.1673e+01 7.5152e+00
% 4.8282e+01 3.3313e+01 1.9368e+01 1.2777e+01 1.2493e+01 7.6062e+00
% 4.8282e+01 3.3313e+01 6.2003e+01 1.1113e+01 2.2313e+01 8.0091e+00
% 4.8282e+01 3.3313e+01 7.6666e+01 1.8246e+01 1.0316e+01 8.2620e+00
% 4.8282e+01 3.3313e+01 3.1822e+01 1.9369e+01 1.1088e+01 8.6014e+00
% 4.8282e+01 1.0000e+02 3.2846e+01 1.8222e+01 8.7025e+00 8.1623e+00
% 4.8282e+01 3.3313e+01 2.0608e+01 2.6597e+01 1.0326e+01 7.2022e+00
%
% Columns 7 through 11
%
% 6.5668e+00 5.9673e+00 5.9635e+00 5.9635e+00 5.9635e+00
% 7.2365e+00 6.6139e+00 5.9635e+00 5.9635e+00 5.9635e+00
% 8.3531e+00 5.9903e+00 5.9635e+00 5.9635e+00 5.9635e+00
% 7.3784e+00 8.1612e+00 5.9635e+00 5.9635e+00 5.9635e+00
% 7.3713e+00 6.8227e+00 5.9635e+00 5.9635e+00 5.9635e+00
% 7.6491e+00 6.2822e+00 5.9635e+00 5.9635e+00 5.9635e+00
% 8.3575e+00 6.6919e+00 6.4153e+00 5.9635e+00 5.9635e+00
% 6.6564e+00 6.1604e+00 6.0776e+00 5.9635e+00 5.9635e+00
% 7.0978e+00 6.0554e+00 5.9635e+00 5.9635e+00 5.9635e+00
% 8.0990e+00 5.9676e+00 5.9635e+00 6.5254e+00 5.9635e+00
Hope this helps.
Greg
2 件のコメント
Greg Heath
2016 年 9 月 28 日
I just ran your 4-input case with DIVIDETRAIN. Although Hub = 5.5 is 1 smaller than the 6.6 of the 3 input case, the information from the new input does allows Rsquare = 0.997 for H=5. In addition, overfitting with H >= 6 does not significantly improve performance.
% % NONOVERFITTING 0 <= H <= 5 < Hub = 5.5
% H 0 1 2 3 4 5
% minNMSE = 10.5 9.82 2.47 0.83 0.51 0.32
% medNMSE = 10.5 9.82 4.64 1.93 0.94 0.47
% meanNMSE = 10.5 9.82 14.7 2.48 1.00 0.48
% maxNMSE = 10.5 9.82 100.00 4.68 2.07 0.79
GEH2 = 'With H = 5 can get Rsquare = 99.7 !'
% OVERFITTING Hub = 5.5 < 6 <= H <= 10
% H 6 7 8 9 10
% minNMSE = 0.30 0.30 0.30 0.30 0.30
% medNMSE = 0.30 0.30 0.30 0.30 0.30
% meanNMSE = 0.35 0.41 0.30 0.30 0.30
% maxNMSE = 0.55 0.97 0.30 0.30 0.30
GEH3 = 'Cannot do significantly better by OVERFITTING!'
Hope this helps.
Greg
P.S. I used the optimistically biased DIVIDETRAIN results to get an upper bound on performance. Although the bias can be mitigated somewhat by multiplying NMSE by Ntrneq/Ndof, I prefer to use estimates based on nontraining data.
参考
カテゴリ
Help Center および File Exchange で Deep Learning Toolbox についてさらに検索
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!