Getting Jumps in mini-batch loss when training YoloV2
    11 ビュー (過去 30 日間)
  
       古いコメントを表示
    
Hello.
i'm trying to train YOLOV2 on my person detector data set.
For some reason i get big Training loss jumps in the middle of the training. i can also see that the temp checkpoint models files are reducing in size dramatically (e.g - from 59MB to 1.5Mb). 
i'm using about 170 pictures with 1-6 bounding box each.
here is the code:
% Define the image input size.
imageSize = [450 800 3];
% Define the number of object classes to detect.
numClasses = width(personDataSet)-1;
anchorBoxes = [
    76  43
    208 147
    103  68
    158 106
    198 137
    129 81
    73 40
];
baseNetwork = resnet50
% Specify the feature extraction layer.
featureLayer = 'activation_49_relu';
analyzeNetwork(baseNetwork);
%reorgLayer = 'activation_47_relu';
% Create the YOLO v2 object detection network. 
% lgraph = yolov2Layers(imageSize,numClasses,anchorBoxes,baseNetwork,featureLayer,'ReorglayerSource',reorgLayer);
lgraph = yolov2Layers(imageSize,numClasses,anchorBoxes,baseNetwork,featureLayer);
% Configure the training options. 
    %  * Lower the learning rate to 1e-3 to stabilize training. 
    %  * Set CheckpointPath to save detector checkpoints to a temporary
    %    location. If training is interrupted due to a system failure or
    %    power outage, you can resume training from the saved checkpoint.
    options = trainingOptions('sgdm', ...
        'MiniBatchSize', 34, ...
        'InitialLearnRate',1e-3, ...
        'MaxEpochs',30,...
        'VerboseFrequency',2, ...
        'CheckpointPath', tempdir);
        %'LearnRateSchedule','piecewise', ...
        %'LearnRateDropPeriod',10 , ...
        %'Shuffle','every-epoch');    
    % Train YOLO v2 detector.
    [detector,info] = trainYOLOv2ObjectDetector(trainingData,lgraph,options);
as seen in code i also tried with 'LearnRateSchedule' and 'Shuffle' and with different learnRate, batch size and epochs. and also getting same results.
this is an example of the one in code:
Starting parallel pool (parpool) using the 'local' profile ...
Connected to the parallel pool (number of workers: 8).
Training on single CPU.
|========================================================================================|
|  Epoch  |  Iteration  |  Time Elapsed  |  Mini-batch  |  Mini-batch  |  Base Learning  |
|         |             |   (hh:mm:ss)   |     RMSE     |     Loss     |      Rate       |
|========================================================================================|
|       1 |           1 |       00:00:37 |         8.56 |         73.2 |          0.0010 |
|       1 |           2 |       00:01:14 |         3.55 |         12.6 |          0.0010 |
|       1 |           4 |       00:02:27 |         2.15 |          4.6 |          0.0010 |
|       2 |           6 |       00:03:44 |         2.81 |          7.9 |          0.0010 |
|       2 |           8 |       00:04:57 |         2.89 |          8.4 |          0.0010 |
|       2 |          10 |       00:06:10 |         2.91 |          8.5 |          0.0010 |
|       3 |          12 |       00:07:26 |         2.80 |          7.8 |          0.0010 |
|       3 |          14 |       00:08:39 |         2.65 |          7.0 |          0.0010 |
|       4 |          16 |       00:09:55 |         2.18 |          4.7 |          0.0010 |
|       4 |          18 |       00:11:08 |         2.23 |          5.0 |          0.0010 |
|       4 |          20 |       00:12:21 |         2.32 |          5.4 |          0.0010 |
|       5 |          22 |       00:13:37 |         2.40 |          5.8 |          0.0010 |
|       5 |          24 |       00:14:50 |         2.42 |          5.9 |          0.0010 |
|       6 |          26 |       00:16:06 |         2.53 |          6.4 |          0.0010 |
|       6 |          28 |       00:17:18 |         2.59 |          6.7 |          0.0010 |
|       6 |          30 |       00:18:31 |         2.37 |          5.6 |          0.0010 |
|       7 |          32 |       00:19:47 |         2.29 |          5.2 |          0.0010 |
|       7 |          34 |       00:20:59 |         2.34 |          5.5 |          0.0010 |
|       8 |          36 |       00:22:15 |         2.24 |          5.0 |          0.0010 |
|       8 |          38 |       00:23:28 |         2.69 |          7.2 |          0.0010 |
|       8 |          40 |       00:24:41 |         2.86 |          8.2 |          0.0010 |
|       9 |          42 |       00:25:56 |         1.63 |          2.7 |          0.0010 |
|       9 |          44 |       00:27:09 |         1.71 |          2.9 |          0.0010 |
|      10 |          46 |       00:28:25 |         1.65 |          2.7 |          0.0010 |
|      10 |          48 |       00:29:37 |         1.68 |          2.8 |          0.0010 |
|      10 |          50 |       00:30:50 |         1.65 |          2.7 |          0.0010 |
|      11 |          52 |       00:32:07 |         1.68 |          2.8 |          0.0010 |
|      11 |          54 |       00:33:20 |         1.71 |          2.9 |          0.0010 |
|      12 |          56 |       00:34:35 |         1.65 |          2.7 |          0.0010 |
|      12 |          58 |       00:35:47 |         1.63 |          2.7 |          0.0010 |
|      12 |          60 |       00:36:58 |         1.62 |          2.6 |          0.0010 |
|      13 |          62 |       00:38:13 |         1.70 |          2.9 |          0.0010 |
|      13 |          64 |       00:39:25 |         1.79 |          3.2 |          0.0010 |
|      14 |          66 |       00:40:40 |         1.66 |          2.8 |          0.0010 |
|      14 |          68 |       00:41:52 |         1.66 |          2.7 |          0.0010 |
|      14 |          70 |       00:43:04 |         2.08 |          4.3 |          0.0010 |
|      15 |          72 |       00:44:19 |         4.30 |         18.5 |          0.0010 |
|      15 |          74 |       00:45:30 |         9.76 |         95.2 |          0.0010 |
|      16 |          76 |       00:46:42 |         9.08 |         82.5 |          0.0010 |
|      16 |          78 |       00:47:54 |         8.59 |         73.8 |          0.0010 |
|      16 |          80 |       00:49:05 |         8.25 |         68.1 |          0.0010 |
|      17 |          82 |       00:50:17 |         8.10 |         65.6 |          0.0010 |
|      17 |          84 |       00:51:30 |         7.86 |         61.7 |          0.0010 |
|      18 |          86 |       00:52:41 |         7.09 |         50.2 |          0.0010 |
|      18 |          88 |       00:53:52 |         6.51 |         42.3 |          0.0010 |
|      18 |          90 |       00:55:04 |         6.66 |         44.4 |          0.0010 |
|      19 |          92 |       00:56:16 |         6.70 |         45.0 |          0.0010 |
|      19 |          94 |       00:57:27 |         6.65 |         44.2 |          0.0010 |
|      20 |          96 |       00:58:39 |         6.18 |         38.3 |          0.0010 |
|      20 |          98 |       00:59:50 |         5.88 |         34.6 |          0.0010 |
|      20 |         100 |       01:01:01 |         6.15 |         37.8 |          0.0010 |
|      21 |         102 |       01:02:13 |         5.88 |         34.5 |          0.0010 |
|      21 |         104 |       01:03:25 |         6.09 |         37.0 |          0.0010 |
|      22 |         106 |       01:04:37 |         6.14 |         37.7 |          0.0010 |
|      22 |         108 |       01:05:48 |         5.12 |         26.2 |          0.0010 |
|      22 |         110 |       01:06:59 |         5.99 |         35.9 |          0.0010 |
|      23 |         112 |       01:08:10 |         5.95 |         35.4 |          0.0010 |
|      23 |         114 |       01:09:21 |         6.21 |         38.6 |          0.0010 |
|      24 |         116 |       01:10:33 |         6.07 |         36.9 |          0.0010 |
|      24 |         118 |       01:11:44 |         5.80 |         33.7 |          0.0010 |
|      24 |         120 |       01:12:55 |         6.30 |         39.7 |          0.0010 |
|      25 |         122 |       01:14:07 |         5.90 |         34.9 |          0.0010 |
|      25 |         124 |       01:15:18 |         6.17 |         38.0 |          0.0010 |
|      26 |         126 |       01:16:31 |         5.85 |         34.2 |          0.0010 |
|      26 |         128 |       01:17:42 |         5.53 |         30.6 |          0.0010 |
|      26 |         130 |       01:18:53 |         5.91 |         35.0 |          0.0010 |
|      27 |         132 |       01:20:05 |         5.88 |         34.6 |          0.0010 |
|      27 |         134 |       01:21:16 |         6.14 |         37.8 |          0.0010 |
|      28 |         136 |       01:22:28 |         6.03 |         36.4 |          0.0010 |
|      28 |         138 |       01:23:40 |         5.26 |         27.6 |          0.0010 |
|      28 |         140 |       01:24:53 |         5.90 |         34.8 |          0.0010 |
|      29 |         142 |       01:26:04 |         5.86 |         34.3 |          0.0010 |
|      29 |         144 |       01:27:16 |         6.14 |         37.7 |          0.0010 |
|      30 |         146 |       01:28:28 |         5.60 |         31.3 |          0.0010 |
|      30 |         148 |       01:29:40 |         5.76 |         33.2 |          0.0010 |
|      30 |         150 |       01:30:52 |         5.89 |         34.7 |          0.0010 |
|========================================================================================|
0 件のコメント
回答 (2 件)
  Zahra Moayed
 2019 年 8 月 5 日
        I had the same issue but when I decided to choose [224 224 3] which is the input size of ResNet and then resize the anchorboxes, it finally worked. However it only worked with Single class.
I also used MiniBatchSize =16 and Shuffle=every-epoch but the main change was the input size
0 件のコメント
参考
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!


