Day 6: Transfer learning - impact of the number of hidden layers 2

I modified my code for training and testing processes like below,

epochs = 1

steps = 0

running_loss = 0

print_every = 10

# Start timer

start = time.time()

for epoch in range(epochs):

for inputs, labels in trainloader:

if steps > 200:

break

steps += 1

# Move input and label tensors to the default device

inputs, labels = inputs.to(device), labels.to(device)

optimizer.zero_grad()

logps = model.forward(inputs)

loss = criterion(logps, labels)

loss.backward()

optimizer.step()

running_loss += loss.item()

if steps % print_every == 0:

test_loss = 0

accuracy = 0

model.eval()

with torch.no_grad():

for inputs, labels in testloader:

inputs, labels = inputs.to(device), labels.to(device)

logps = model.forward(inputs)

batch_loss = criterion(logps, labels)

test_loss += batch_loss.item()

# Calculate accuracy

ps = torch.exp(logps)

top_p, top_class = ps.topk(1, dim=1)

equals = top_class == labels.view(*top_class.shape)

accuracy += torch.mean(equals.type(torch.FloatTensor)).item()

print(f"{epoch+1}/{epochs} "

f"{steps} "

f"{(time.time() - start)/print_every:.3f} "

f"{running_loss/print_every:.3f} "

f"{test_loss/len(testloader):.3f} "

f"{accuracy/len(testloader):.3f}")

running_loss = 0

model.train()

# Restart timer

start = time.time()

And the result for 1 to 5 hidden layers are,

epochs steps processing_time/batch training_loss test_loss accuracy

# one hidden layer

1/1 10 5.128 1.432 0.271 0.873

1/1 20 5.285 0.263 0.149 0.945

1/1 30 5.290 0.216 0.071 0.974

1/1 40 5.305 0.191 0.079 0.974

1/1 50 5.139 0.190 0.201 0.924

1/1 60 5.306 0.187 0.100 0.961

1/1 70 5.296 0.178 0.058 0.982

1/1 80 5.291 0.151 0.059 0.977

1/1 90 5.130 0.204 0.065 0.976

1/1 100 5.305 0.177 0.052 0.982

1/1 110 5.303 0.216 0.070 0.977

1/1 120 5.310 0.183 0.053 0.981

1/1 130 5.136 0.125 0.201 0.928

1/1 140 5.292 0.279 0.071 0.974

1/1 150 5.293 0.162 0.053 0.982

1/1 160 5.297 0.165 0.061 0.978

1/1 170 5.129 0.178 0.058 0.980

1/1 180 5.295 0.150 0.065 0.977

1/1 190 5.292 0.252 0.062 0.978

1/1 200 5.293 0.266 0.048 0.986

# two hidden layers

1/1 10 5.185 1.384 0.209 0.975

1/1 20 5.210 0.280 0.147 0.948

1/1 30 5.215 0.462 0.148 0.955

1/1 40 5.205 0.280 0.127 0.964

1/1 50 5.227 0.160 0.057 0.976

1/1 60 5.203 0.287 0.107 0.962

1/1 70 5.196 0.223 0.066 0.979

1/1 80 5.206 0.163 0.045 0.985

1/1 90 5.199 0.150 0.050 0.983

1/1 100 5.221 0.191 0.064 0.977

1/1 110 5.191 0.210 0.056 0.982

1/1 120 5.198 0.151 0.051 0.981

1/1 130 5.208 0.260 0.167 0.917

1/1 140 5.206 0.192 0.044 0.986

1/1 150 5.207 0.192 0.041 0.985

1/1 160 5.200 0.155 0.041 0.984

1/1 170 5.206 0.186 0.048 0.984

1/1 180 5.206 0.132 0.058 0.979

1/1 190 5.212 0.189 0.067 0.973

1/1 200 5.203 0.209 0.049 0.986

# three hidden layers

1/1 10 5.098 1.598 0.590 0.685

1/1 20 5.183 0.473 0.105 0.975

1/1 30 5.121 0.241 0.144 0.946

1/1 40 5.184 0.204 0.065 0.978

1/1 50 5.118 0.205 0.078 0.979

1/1 60 5.203 0.134 0.111 0.958

1/1 70 5.105 0.334 0.142 0.947

1/1 80 5.184 0.270 0.058 0.980

1/1 90 5.102 0.164 0.084 0.971

1/1 100 5.187 0.205 0.088 0.966

1/1 110 5.097 0.171 0.149 0.958

1/1 120 5.192 0.179 0.046 0.984

1/1 130 5.120 0.164 0.052 0.981

1/1 140 5.189 0.133 0.040 0.985

1/1 150 5.099 0.263 0.077 0.985

1/1 160 5.196 0.191 0.067 0.973

1/1 170 5.095 0.181 0.051 0.980

1/1 180 5.185 0.185 0.064 0.975

1/1 190 5.112 0.159 0.047 0.984

1/1 200 5.190 0.141 0.046 0.985

# four hidden layers

1/1 10 5.257 0.806 0.673 0.506

1/1 20 5.277 0.622 0.320 0.927

1/1 30 5.265 0.255 0.107 0.960

1/1 40 5.280 0.247 0.167 0.941

1/1 50 5.256 0.288 0.088 0.980

1/1 60 5.275 0.191 0.060 0.975

1/1 70 5.263 0.194 0.059 0.980

1/1 80 5.266 0.211 0.059 0.980

1/1 90 5.258 0.176 0.100 0.965

1/1 100 5.273 0.224 0.068 0.979

1/1 110 5.270 0.170 0.074 0.974

1/1 120 5.264 0.134 0.046 0.981

1/1 130 5.263 0.176 0.088 0.964

1/1 140 5.276 0.179 0.102 0.966

1/1 150 5.264 0.165 0.042 0.981

1/1 160 5.270 0.133 0.041 0.981

1/1 170 5.271 0.227 0.082 0.981

1/1 180 5.277 0.143 0.041 0.983

1/1 190 5.265 0.259 0.090 0.982

1/1 200 5.275 0.208 0.045 0.983

# five hidden layers

1/1 10 5.564 0.739 0.617 0.525

1/1 20 5.384 0.594 0.285 0.957

1/1 30 5.221 0.404 0.096 0.964

1/1 40 5.164 0.287 0.195 0.907

1/1 50 5.160 0.231 0.059 0.977

1/1 60 5.394 0.255 0.111 0.965

1/1 70 5.190 0.189 0.096 0.971

1/1 80 5.207 0.405 0.152 0.979

1/1 90 5.206 0.247 0.061 0.975

1/1 100 5.386 0.225 0.096 0.970

1/1 110 5.208 0.151 0.071 0.967

1/1 120 5.203 0.242 0.062 0.983

1/1 130 5.206 0.171 0.053 0.977

1/1 140 5.380 0.136 0.056 0.978

1/1 150 5.218 0.156 0.096 0.961

1/1 160 5.235 0.110 0.086 0.971

1/1 170 5.222 0.179 0.050 0.984

1/1 180 5.386 0.126 0.066 0.975

1/1 190 5.212 0.168 0.066 0.971

1/1 200 5.215 0.231 0.096 0.978

I changed the iteration to 200 to shorten the testing time. I found interesting things:

1. The processing time for one to five hidden layers are quite similar, no significance differences. I would like to test using 10 hidden layers tomorrow, in addition to test with resnet101 pre trained algorithm.

2. Interesting part is with accuracy. Increasing hidden layer from one to two will increase the accuracy quicker but adding hidden layer to 3, 4, 5 will drop the accuracies for the same iteration.

3. I only used 1 epoch, tomorrow I would like to test using several epoch and diiference batch size. I want to see the differences.

Thats all for today.

Search This Blog

60 days of udacity

Day 6: Transfer learning - impact of the number of hidden layers 2

Comments

Post a Comment

Popular posts from this blog

Day 38: fast.ai Lesson 3 - Performance, validation and model interpretation revisited I

Day 37: fast.ai Lesson 2 - Random forest deep dive revisited

Day 35: fast.ai Lesson 7 - RF from scratch and gradient descent II