Neural Network Training Implementation
古いコメントを表示
Hi, I have been trying to implement my own version of gradient-descent training. The cost function I use for minimization is negative log likelihood. The datasets I have vary between ~1000 samples to ~5000 samples (both for training and unknown test sets). I have used these datasets for training using NNToolbox. Now, my implementation of the neural network do perform well and I have been able to attain accuracy close to 99%. However, when I try to compare my backpropgated partial derivatives with numerical gradients checking method , the difference is too large to not be suspicious of my implementation. I believe the problem somehow lies with when I update the parameters. I tried updating weights after scanning individual samples (on-line), mini-batch and the whole batch. Also, I believe, the final parameter values are too large.
Below is a piece of code that performs backprop and parameter updates for an epoch and updates after scanning individual example.
if true
lambda = 0; % Regularisation parameter
numbatches = 1;
multiPlier = (1 - (learnRate * lambda/size(ipFeatures,2)));
for l = 1 : numbatches
currentBatch = trainP(:, batchInd( (l-1)*batchSize+1 : l*batchSize ) ); % Would be 1 for this case
batchTargets = trainT(:, batchInd( (l-1)*batchSize+1 : l*batchSize ) ); % Would be 1 for this case
activations = forwardPropagation(currentBatch, model);
deltaErrors = computeDeltaError(activations, batchTargets, model);
for t = 1 : numHiddenLayers
% Compute Partial Derivatives
dW{t} = (deltaErrors{t+1} * activations{t}');
db{t} = sum(deltaErrors{t+1}, 2);
% Update parameters
model.weights{t} = multiPlier * model.weights{t}...
- (learnRate/size(currentBatch,2)).*dW{t};
model.bias{t} = model.bias{t} - (learnRate/size(currentBatch,2)).*db{t};
end
end
end
The value of numbatches decides the batch-mode, on-line mode or mini-batch mode operation of the network. I use dW and db to compare with numerical gradients.
Also, I believe, the final parameter values are too large. E.g., one of the weight matrix that I obtained for the dataset trained with 816 samples and tested on dataset with 725 samples, with the classification accuracy of 98.79% which is good as the test dataset has some noisy labels.
-0.2853 -1.3728 -0.6968 0.4703 -1.2471 2.0104 0.2644 -0.6097 0.7695 0.3747
1.4270 1.2017 0.6934 0.8725 0.4917 -1.0928 -0.3810 0.9145 -1.2533 -0.3824
Few sample values of the backpropogated partial derivatives, numerically computed gradient and their difference.
backPropGrads = 0.000863502714559410 0.0112093229963550 9.74490423775809e-05 0.000175776868318497 0.00845120635863130 -0.00189301667233442 -0.00653141680913231 0.00496566802896389 -0.0205541611216203 -0.000101576463654545
numericalGrads = -0.00246672065599973 0.00105451893203656 -0.000341400989006813 -0.000228545330785424 0.000629285591066675 0.00526790995436510 0.00255049267060270 -0.00283454504222680 0.00643259144054997 -2.32609314448906e-05
GradientDifferene = 0.00333022337055914 0.0101548040643184 0.000438850031384393 0.000404322199103921 0.00782192076756462 0.00716092662669952 0.00908190947973501 0.00780021307119069 0.0269867525621703 7.83155322096546e-05
Can anyone suggest me as to what I am doing wrong here. Am I performing the weight updates properly.
- Nilay
回答 (0 件)
カテゴリ
ヘルプ センター および File Exchange で Deep Learning Toolbox についてさらに検索
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!