Day 7: Transfer Learning - Impact of the number of hidden layers 3
Still looking on how to measure the impact of increasing the number of hidden layers on training. I am comparing in difference pre trained network using densenet 121 and resnet101, one and 10 hidden layers, the results are not expected that all values are similar. One hidden layer and 10 hidden layers make no difference in term of training time.
# resnet101 - one hidden layer
epochs step processing_time/batch training_loss
1/5 1 0.499 0.688
1/5 2 0.460 3.413
1/5 3 0.438 3.209
1/5 4 0.427 3.816
1/5 5 0.428 0.382
1/5 6 0.429 1.813
1/5 7 0.428 1.739
1/5 8 0.430 0.169
1/5 9 0.430 0.635
1/5 10 0.432 0.469
2/5 1 0.428 0.515
2/5 2 0.422 0.413
2/5 3 0.437 0.337
2/5 4 0.430 0.354
2/5 5 0.436 0.266
2/5 6 0.425 0.763
2/5 7 0.431 0.299
2/5 8 0.437 0.446
2/5 9 0.422 0.278
2/5 10 0.424 0.143
3/5 1 0.434 0.403
3/5 2 0.435 0.085
3/5 3 0.428 0.211
3/5 4 0.434 0.261
3/5 5 0.425 0.108
3/5 6 0.433 0.239
3/5 7 0.432 0.201
3/5 8 0.423 0.230
3/5 9 0.433 0.143
3/5 10 0.443 0.211
4/5 1 0.430 0.163
4/5 2 0.431 0.431
4/5 3 0.431 0.170
4/5 4 0.437 0.119
4/5 5 0.425 0.087
4/5 6 0.429 0.095
4/5 7 0.435 0.093
4/5 8 0.431 0.154
4/5 9 0.420 0.237
4/5 10 0.430 0.384
5/5 1 0.429 0.250
5/5 2 0.439 0.219
5/5 3 0.434 0.039
5/5 4 0.438 0.214
5/5 5 0.432 0.093
5/5 6 0.429 0.129
5/5 7 0.426 0.085
5/5 8 0.435 0.093
5/5 9 0.433 0.158
5/5 10 0.429 0.085
# resnet101 - ten hidden layers
epochs step processing_time/batch training_loss
1/5 1 0.517 0.692
1/5 2 0.437 0.692
1/5 3 0.426 0.683
1/5 4 0.435 0.689
1/5 5 0.416 0.680
1/5 6 0.427 0.624
1/5 7 0.415 0.580
1/5 8 0.422 0.823
1/5 9 0.417 0.604
1/5 10 0.424 0.664
2/5 1 0.423 0.694
2/5 2 0.421 0.698
2/5 3 0.418 0.688
2/5 4 0.425 0.695
2/5 5 0.426 0.692
2/5 6 0.421 0.688
2/5 7 0.416 0.696
2/5 8 0.425 0.682
2/5 9 0.425 0.684
2/5 10 0.421 0.637
3/5 1 0.415 0.734
3/5 2 0.427 0.693
3/5 3 0.430 0.691
3/5 4 0.416 0.674
3/5 5 0.423 0.695
3/5 6 0.426 0.653
3/5 7 0.423 0.582
3/5 8 0.417 0.675
3/5 9 0.416 0.449
3/5 10 0.428 0.412
4/5 1 0.423 0.732
4/5 2 0.432 0.473
4/5 3 0.418 1.053
4/5 4 0.414 0.663
4/5 5 0.421 0.681
4/5 6 0.417 0.697
4/5 7 0.425 0.683
4/5 8 0.427 0.681
4/5 9 0.420 0.696
4/5 10 0.417 0.688
5/5 1 0.422 0.700
5/5 2 0.429 0.683
5/5 3 0.426 0.689
5/5 4 0.420 0.674
5/5 5 0.419 0.662
5/5 6 0.428 0.630
5/5 7 0.416 0.590
5/5 8 0.426 1.285
5/5 9 0.427 0.566
5/5 10 0.417 0.683
# demsenet121 - one hidden layer
epochs step processing_time/batch training_loss
1/5 1 0.309 0.664
1/5 2 0.263 0.599
1/5 3 0.243 3.846
1/5 4 0.240 1.029
1/5 5 0.238 1.154
1/5 6 0.238 1.555
1/5 7 0.235 1.504
1/5 8 0.239 0.443
1/5 9 0.236 0.399
1/5 10 0.235 0.488
2/5 1 0.236 0.766
2/5 2 0.241 0.586
2/5 3 0.237 0.303
2/5 4 0.235 0.532
2/5 5 0.235 0.516
2/5 6 0.237 0.356
2/5 7 0.236 0.396
2/5 8 0.237 0.209
2/5 9 0.236 0.231
2/5 10 0.236 0.320
3/5 1 0.238 0.198
3/5 2 0.238 0.156
3/5 3 0.236 0.316
3/5 4 0.236 0.219
3/5 5 0.236 0.144
3/5 6 0.238 0.394
3/5 7 0.234 0.237
3/5 8 0.235 0.100
3/5 9 0.233 0.072
3/5 10 0.237 0.253
4/5 1 0.238 0.392
4/5 2 0.239 0.355
4/5 3 0.237 0.167
4/5 4 0.241 0.177
4/5 5 0.235 0.241
4/5 6 0.238 0.078
4/5 7 0.243 0.175
4/5 8 0.237 0.180
4/5 9 0.236 0.277
4/5 10 0.235 0.195
5/5 1 0.236 0.088
5/5 2 0.237 0.215
5/5 3 0.234 0.246
5/5 4 0.242 0.262
5/5 5 0.237 0.107
5/5 6 0.239 0.353
5/5 7 0.238 0.291
5/5 8 0.240 0.098
5/5 9 0.238 0.079
5/5 10 0.239 0.475
# densenet121 - ten hidden layers
epochs step processing_time/batch training_loss
1/5 1 0.324 0.692
1/5 2 0.272 0.697
1/5 3 0.248 0.699
1/5 4 0.244 0.766
1/5 5 0.246 0.699
1/5 6 0.246 0.696
1/5 7 0.245 0.693
1/5 8 0.245 0.697
1/5 9 0.247 0.693
1/5 10 0.246 0.693
2/5 1 0.246 0.692
2/5 2 0.247 0.693
2/5 3 0.244 0.695
2/5 4 0.245 0.686
2/5 5 0.244 0.683
2/5 6 0.246 0.656
2/5 7 0.247 0.482
2/5 8 0.246 1.228
2/5 9 0.246 0.505
2/5 10 0.246 0.622
3/5 1 0.247 0.679
3/5 2 0.247 0.692
3/5 3 0.245 0.696
3/5 4 0.248 0.703
3/5 5 0.244 0.689
3/5 6 0.245 0.692
3/5 7 0.246 0.682
3/5 8 0.244 0.689
3/5 9 0.245 0.690
3/5 10 0.245 0.688
4/5 1 0.245 0.686
4/5 2 0.246 0.682
4/5 3 0.242 0.688
4/5 4 0.244 0.683
4/5 5 0.246 0.670
4/5 6 0.245 0.675
4/5 7 0.245 0.658
4/5 8 0.243 0.646
4/5 9 0.245 0.575
4/5 10 0.243 0.559
5/5 1 0.246 0.772
5/5 2 0.247 0.453
5/5 3 0.245 0.646
5/5 4 0.244 0.539
5/5 5 0.243 0.464
5/5 6 0.243 0.452
5/5 7 0.244 0.482
5/5 8 0.243 0.243
5/5 9 0.242 0.478
5/5 10 0.242 0.196
They were running using cuda. Is there any differences when run using cpu? No. Below results show there are no significance differences between one and ten hidden layers.
# densenet121 - one hidden layer
epochs step processing_time/batch training_loss
# densenet121 - ten hidden layers
epochs step processing_time/batch training_loss
Final test, if I removed dropout, would it be any difference? Unfortunately, still no.
# resnet101 - one hidden layer
epochs step processing_time/batch training_loss
1/5 1 0.499 0.688
1/5 2 0.460 3.413
1/5 3 0.438 3.209
1/5 4 0.427 3.816
1/5 5 0.428 0.382
1/5 6 0.429 1.813
1/5 7 0.428 1.739
1/5 8 0.430 0.169
1/5 9 0.430 0.635
1/5 10 0.432 0.469
2/5 1 0.428 0.515
2/5 2 0.422 0.413
2/5 3 0.437 0.337
2/5 4 0.430 0.354
2/5 5 0.436 0.266
2/5 6 0.425 0.763
2/5 7 0.431 0.299
2/5 8 0.437 0.446
2/5 9 0.422 0.278
2/5 10 0.424 0.143
3/5 1 0.434 0.403
3/5 2 0.435 0.085
3/5 3 0.428 0.211
3/5 4 0.434 0.261
3/5 5 0.425 0.108
3/5 6 0.433 0.239
3/5 7 0.432 0.201
3/5 8 0.423 0.230
3/5 9 0.433 0.143
3/5 10 0.443 0.211
4/5 1 0.430 0.163
4/5 2 0.431 0.431
4/5 3 0.431 0.170
4/5 4 0.437 0.119
4/5 5 0.425 0.087
4/5 6 0.429 0.095
4/5 7 0.435 0.093
4/5 8 0.431 0.154
4/5 9 0.420 0.237
4/5 10 0.430 0.384
5/5 1 0.429 0.250
5/5 2 0.439 0.219
5/5 3 0.434 0.039
5/5 4 0.438 0.214
5/5 5 0.432 0.093
5/5 6 0.429 0.129
5/5 7 0.426 0.085
5/5 8 0.435 0.093
5/5 9 0.433 0.158
5/5 10 0.429 0.085
# resnet101 - ten hidden layers
epochs step processing_time/batch training_loss
1/5 1 0.517 0.692
1/5 2 0.437 0.692
1/5 3 0.426 0.683
1/5 4 0.435 0.689
1/5 5 0.416 0.680
1/5 6 0.427 0.624
1/5 7 0.415 0.580
1/5 8 0.422 0.823
1/5 9 0.417 0.604
1/5 10 0.424 0.664
2/5 1 0.423 0.694
2/5 2 0.421 0.698
2/5 3 0.418 0.688
2/5 4 0.425 0.695
2/5 5 0.426 0.692
2/5 6 0.421 0.688
2/5 7 0.416 0.696
2/5 8 0.425 0.682
2/5 9 0.425 0.684
2/5 10 0.421 0.637
3/5 1 0.415 0.734
3/5 2 0.427 0.693
3/5 3 0.430 0.691
3/5 4 0.416 0.674
3/5 5 0.423 0.695
3/5 6 0.426 0.653
3/5 7 0.423 0.582
3/5 8 0.417 0.675
3/5 9 0.416 0.449
3/5 10 0.428 0.412
4/5 1 0.423 0.732
4/5 2 0.432 0.473
4/5 3 0.418 1.053
4/5 4 0.414 0.663
4/5 5 0.421 0.681
4/5 6 0.417 0.697
4/5 7 0.425 0.683
4/5 8 0.427 0.681
4/5 9 0.420 0.696
4/5 10 0.417 0.688
5/5 1 0.422 0.700
5/5 2 0.429 0.683
5/5 3 0.426 0.689
5/5 4 0.420 0.674
5/5 5 0.419 0.662
5/5 6 0.428 0.630
5/5 7 0.416 0.590
5/5 8 0.426 1.285
5/5 9 0.427 0.566
5/5 10 0.417 0.683
# demsenet121 - one hidden layer
epochs step processing_time/batch training_loss
1/5 1 0.309 0.664
1/5 2 0.263 0.599
1/5 3 0.243 3.846
1/5 4 0.240 1.029
1/5 5 0.238 1.154
1/5 6 0.238 1.555
1/5 7 0.235 1.504
1/5 8 0.239 0.443
1/5 9 0.236 0.399
1/5 10 0.235 0.488
2/5 1 0.236 0.766
2/5 2 0.241 0.586
2/5 3 0.237 0.303
2/5 4 0.235 0.532
2/5 5 0.235 0.516
2/5 6 0.237 0.356
2/5 7 0.236 0.396
2/5 8 0.237 0.209
2/5 9 0.236 0.231
2/5 10 0.236 0.320
3/5 1 0.238 0.198
3/5 2 0.238 0.156
3/5 3 0.236 0.316
3/5 4 0.236 0.219
3/5 5 0.236 0.144
3/5 6 0.238 0.394
3/5 7 0.234 0.237
3/5 8 0.235 0.100
3/5 9 0.233 0.072
3/5 10 0.237 0.253
4/5 1 0.238 0.392
4/5 2 0.239 0.355
4/5 3 0.237 0.167
4/5 4 0.241 0.177
4/5 5 0.235 0.241
4/5 6 0.238 0.078
4/5 7 0.243 0.175
4/5 8 0.237 0.180
4/5 9 0.236 0.277
4/5 10 0.235 0.195
5/5 1 0.236 0.088
5/5 2 0.237 0.215
5/5 3 0.234 0.246
5/5 4 0.242 0.262
5/5 5 0.237 0.107
5/5 6 0.239 0.353
5/5 7 0.238 0.291
5/5 8 0.240 0.098
5/5 9 0.238 0.079
5/5 10 0.239 0.475
# densenet121 - ten hidden layers
epochs step processing_time/batch training_loss
1/5 1 0.324 0.692
1/5 2 0.272 0.697
1/5 3 0.248 0.699
1/5 4 0.244 0.766
1/5 5 0.246 0.699
1/5 6 0.246 0.696
1/5 7 0.245 0.693
1/5 8 0.245 0.697
1/5 9 0.247 0.693
1/5 10 0.246 0.693
2/5 1 0.246 0.692
2/5 2 0.247 0.693
2/5 3 0.244 0.695
2/5 4 0.245 0.686
2/5 5 0.244 0.683
2/5 6 0.246 0.656
2/5 7 0.247 0.482
2/5 8 0.246 1.228
2/5 9 0.246 0.505
2/5 10 0.246 0.622
3/5 1 0.247 0.679
3/5 2 0.247 0.692
3/5 3 0.245 0.696
3/5 4 0.248 0.703
3/5 5 0.244 0.689
3/5 6 0.245 0.692
3/5 7 0.246 0.682
3/5 8 0.244 0.689
3/5 9 0.245 0.690
3/5 10 0.245 0.688
4/5 1 0.245 0.686
4/5 2 0.246 0.682
4/5 3 0.242 0.688
4/5 4 0.244 0.683
4/5 5 0.246 0.670
4/5 6 0.245 0.675
4/5 7 0.245 0.658
4/5 8 0.243 0.646
4/5 9 0.245 0.575
4/5 10 0.243 0.559
5/5 1 0.246 0.772
5/5 2 0.247 0.453
5/5 3 0.245 0.646
5/5 4 0.244 0.539
5/5 5 0.243 0.464
5/5 6 0.243 0.452
5/5 7 0.244 0.482
5/5 8 0.243 0.243
5/5 9 0.242 0.478
5/5 10 0.242 0.196
# densenet121 - one hidden layer
epochs step processing_time/batch training_loss
1/5 1 8.270 0.800
1/5 2 8.194 3.046
1/5 3 8.016 2.353
1/5 4 8.048 0.773
1/5 5 8.198 1.076
1/5 6 8.146 0.603
1/5 7 8.142 0.991
1/5 8 8.120 0.442
1/5 9 8.086 0.486
1/5 10 8.086 0.423
2/5 1 8.315 0.510
2/5 2 8.006 0.742
2/5 3 8.240 0.679
2/5 4 7.827 0.437
2/5 5 8.113 0.407
2/5 6 7.917 0.418
2/5 7 8.073 0.411
2/5 8 8.284 0.494
2/5 9 8.169 0.323
2/5 10 8.245 0.653
3/5 1 8.073 0.317
3/5 2 8.340 0.265
3/5 3 8.086 0.365
3/5 4 8.003 0.288
3/5 5 8.228 0.271
3/5 6 8.107 0.223
3/5 7 8.293 0.295
3/5 8 8.098 0.186
3/5 9 8.156 0.291
3/5 10 8.067 0.148
4/5 1 8.127 0.149
4/5 2 8.012 0.169
4/5 3 8.105 0.346
4/5 4 8.038 0.249
4/5 5 8.191 0.225
4/5 6 8.342 0.098
4/5 7 8.105 0.177
4/5 8 7.961 0.175
4/5 9 7.877 0.150
4/5 10 8.047 0.157
5/5 1 8.121 0.185
5/5 2 7.987 0.150
5/5 3 8.052 0.096
5/5 4 7.951 0.268
5/5 5 7.945 0.294
5/5 6 8.013 0.145
5/5 7 8.110 0.096
5/5 8 7.978 0.400
5/5 9 8.126 0.582
5/5 10 8.001 0.265
# densenet121 - ten hidden layers
epochs step processing_time/batch training_loss
1/5 1 8.402 0.693
1/5 2 8.207 0.698
1/5 3 8.113 0.701
1/5 4 7.946 0.694
1/5 5 8.302 0.694
1/5 6 8.104 0.693
1/5 7 7.984 0.691
1/5 8 8.260 0.687
1/5 9 8.075 0.611
1/5 10 8.167 0.799
2/5 1 7.973 0.592
2/5 2 8.033 0.673
2/5 3 8.102 0.686
2/5 4 8.030 0.689
2/5 5 8.261 0.694
2/5 6 8.065 0.684
2/5 7 7.860 0.686
2/5 8 8.052 0.676
2/5 9 7.974 0.673
2/5 10 8.204 0.637
3/5 1 7.988 0.580
3/5 2 8.003 0.433
3/5 3 7.978 1.178
3/5 4 7.992 0.462
3/5 5 8.000 0.519
3/5 6 8.001 0.662
3/5 7 8.098 0.631
3/5 8 7.989 0.650
3/5 9 7.938 0.667
3/5 10 7.912 0.585
4/5 1 7.927 0.519
4/5 2 7.957 0.606
4/5 3 8.041 0.513
4/5 4 7.880 0.363
4/5 5 7.916 0.402
4/5 6 7.863 0.279
4/5 7 7.882 0.564
4/5 8 7.945 0.731
4/5 9 7.912 0.165
4/5 10 7.835 0.933
5/5 1 8.003 0.400
5/5 2 7.905 0.435
5/5 3 7.968 0.423
5/5 4 7.958 0.525
5/5 5 8.007 0.522
5/5 6 7.847 0.549
5/5 7 7.877 0.422
5/5 8 8.077 0.716
5/5 9 8.102 0.643
5/5 10 8.151 0.300
Final test, if I removed dropout, would it be any difference? Unfortunately, still no.
# densenet121 - one hidden layer
epochs step processing_time/batch training_loss
epochs step processing_time/batch training_loss
1/5 1 0.310 0.738
1/5 2 0.269 2.765
1/5 3 0.242 0.356
1/5 4 0.238 2.627
1/5 5 0.234 0.996
1/5 6 0.237 0.372
1/5 7 0.235 0.557
1/5 8 0.238 0.452
1/5 9 0.235 0.769
1/5 10 0.237 0.320
2/5 1 0.235 0.446
2/5 2 0.236 0.394
2/5 3 0.234 0.333
2/5 4 0.238 0.413
2/5 5 0.233 0.494
2/5 6 0.236 0.310
2/5 7 0.235 0.350
2/5 8 0.236 0.510
2/5 9 0.234 0.209
2/5 10 0.234 0.223
3/5 1 0.235 0.185
3/5 2 0.234 0.313
3/5 3 0.233 0.178
3/5 4 0.235 0.222
3/5 5 0.233 0.325
3/5 6 0.234 0.297
3/5 7 0.232 0.099
3/5 8 0.234 0.250
3/5 9 0.233 0.392
3/5 10 0.234 0.114
4/5 1 0.234 0.517
4/5 2 0.235 0.152
4/5 3 0.232 0.142
4/5 4 0.234 0.102
4/5 5 0.233 0.132
4/5 6 0.233 0.326
4/5 7 0.232 0.266
4/5 8 0.234 0.244
4/5 9 0.233 0.131
4/5 10 0.234 0.121
5/5 1 0.234 0.134
5/5 2 0.233 0.135
5/5 3 0.233 0.336
5/5 4 0.234 0.122
5/5 5 0.235 0.220
5/5 6 0.234 0.155
5/5 7 0.232 0.103
5/5 8 0.235 0.227
5/5 9 0.232 0.170
5/5 10 0.234 0.239
# densenet121 - ten hidden layers
epochs step processing_time/batch training_loss
epochs step processing_time/batch training_loss
1/5 1 0.327 0.695
1/5 2 0.294 0.694
1/5 3 0.258 0.703
1/5 4 0.240 0.695
1/5 5 0.240 0.700
1/5 6 0.238 0.633
1/5 7 0.238 0.663
1/5 8 0.241 0.601
1/5 9 0.240 0.685
1/5 10 0.239 0.708
2/5 1 0.240 0.693
2/5 2 0.241 0.675
2/5 3 0.241 0.693
2/5 4 0.242 0.653
2/5 5 0.238 0.508
2/5 6 0.241 0.409
2/5 7 0.238 2.357
2/5 8 0.237 0.455
2/5 9 0.240 0.584
2/5 10 0.245 0.607
3/5 1 0.243 0.756
3/5 2 0.242 0.762
3/5 3 0.241 0.675
3/5 4 0.244 0.699
3/5 5 0.241 0.695
3/5 6 0.240 0.691
3/5 7 0.239 0.691
3/5 8 0.246 0.691
3/5 9 0.241 0.695
3/5 10 0.241 0.687
4/5 1 0.240 0.698
4/5 2 0.243 0.692
4/5 3 0.238 0.690
4/5 4 0.242 0.695
4/5 5 0.240 0.686
4/5 6 0.245 0.699
4/5 7 0.242 0.693
4/5 8 0.241 0.691
4/5 9 0.241 0.689
4/5 10 0.241 0.695
5/5 1 0.243 0.695
5/5 2 0.240 0.699
5/5 3 0.241 0.693
5/5 4 0.238 0.693
5/5 5 0.241 0.694
5/5 6 0.240 0.694
5/5 7 0.239 0.691
5/5 8 0.240 0.693
5/5 9 0.239 0.694
5/5 10 0.240 0.695
Below is code to measure the processing time and training loss,
epochs = 5
for epoch in range(epochs):
steps = 0
for inputs, labels in trainloader:
# Start timer
start = time.time()
running_loss = 0
if steps >= 10:
break
steps += 1
# Move input and label tensors to the default device
inputs, labels = inputs.to(device), labels.to(device)
optimizer.zero_grad()
logps = model.forward(inputs)
loss = criterion(logps, labels)
loss.backward()
optimizer.step()
running_loss = loss.item()
print(f"{epoch+1}/{epochs} "
f"{steps} "
f"{(time.time() - start):.3f} "
f"{running_loss:.3f}")
My current experiment still not showing that adding more hidden layer will take processing time longer. They are quite similar.
Tomorrow I want to test the accuracy and testing loss related to additional hidden layers.
Comments
Post a Comment