Day 5: Transfer learning - impact of the number of hidden layers 1

In one of discussion with my fellow scholars, I want to find out the effect of number of hidden layers defined in a model. Some says 2 or 3 hidden layers is the number of optimum hidden layers.
The larger the number of hidden layer, the more your model will overfit the data, and can't generalize any solution. That mean you will get a higher loss. The larger number take more time to train your model too. The lesser number of hidden layer cause underfitting.

I want to check that by modifyng the Lesson 2 Part 8 code. First prepare the trainloader and testloader data.

%matplotlib inline
%config InlineBackend.figure_format = 'retina'

import time

import matplotlib.pyplot as plt

import torch
from torch import nn
from torch import optim
import torch.nn.functional as F
from torchvision import datasets, transforms, models

data_dir = 'Cat_Dog_data'

# Define transforms for the training data and testing data
train_transforms = transforms.Compose([transforms.RandomRotation(30),
transforms.RandomResizedCrop(224),
transforms.RandomHorizontalFlip(),
transforms.ToTensor(),
transforms.Normalize([0.485, 0.456, 0.406],
[0.229, 0.224, 0.225])])

test_transforms = transforms.Compose([transforms.Resize(255),
transforms.CenterCrop(224),
transforms.ToTensor(),
transforms.Normalize([0.485, 0.456, 0.406],
[0.229, 0.224, 0.225])])

# Pass transforms in here, then run the next cell to see how the transforms look
train_data = datasets.ImageFolder(data_dir + '/train', transform=train_transforms)
test_data = datasets.ImageFolder(data_dir + '/test', transform=test_transforms)

trainloader = torch.utils.data.DataLoader(train_data, batch_size=32, shuffle=True)
testloader = torch.utils.data.DataLoader(test_data, batch_size=32)

# Use GPU if it's available
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

model = models.resnet101(pretrained=True)

# Freeze parameters so we don't backprop through them
for param in model.parameters():
param.requires_grad = False

model.fc = nn.Sequential(nn.Linear(2048, 512),
nn.ReLU(),
nn.Dropout(0.1),
nn.Linear(512, 2),
nn.LogSoftmax(dim=1))

criterion = nn.NLLLoss()

# Only train the classifier parameters, feature parameters are frozen
optimizer = optim.Adam(model.fc.parameters(), lr=0.003)

model.to(device);

The yellow background text is one that will be changed from one hidden layer to two, three, four and five hidden layers.

model.fc = nn.Sequential(nn.Linear(2048, 1024),
nn.ReLU(),
nn.Dropout(0.1),
nn.Linear(1024, 512),
nn.ReLU(),
nn.Dropout(0.1),
nn.Linear(512, 2),
nn.LogSoftmax(dim=1))

model.fc = nn.Sequential(nn.Linear(2048, 1024),
nn.ReLU(),
nn.Dropout(0.1),
nn.Linear(1024, 512),
nn.ReLU(),
nn.Dropout(0.1),
nn.Linear(512, 256),
nn.ReLU(),
nn.Dropout(0.1),
nn.Linear(256, 2),
nn.LogSoftmax(dim=1))

model.fc = nn.Sequential(nn.Linear(2048, 1024),
nn.ReLU(),
nn.Dropout(0.1),
nn.Linear(1024, 512),
nn.ReLU(),
nn.Dropout(0.1),
nn.Linear(512, 256),
nn.ReLU(),
nn.Dropout(0.1),
nn.Linear(256, 128),
nn.ReLU(),
nn.Dropout(0.1),
nn.Linear(128, 2),
nn.LogSoftmax(dim=1))

model.fc = nn.Sequential(nn.Linear(2048, 1024),
nn.ReLU(),
nn.Dropout(0.1),
nn.Linear(1024, 512),
nn.ReLU(),
nn.Dropout(0.1),
nn.Linear(512, 256),
nn.ReLU(),
nn.Dropout(0.1),
nn.Linear(256, 128),
nn.ReLU(),
nn.Dropout(0.1),
nn.Linear(128, 64),
nn.ReLU(),
nn.Dropout(0.1),
nn.Linear(64, 2),
nn.LogSoftmax(dim=1))

I want to shorten the test time by setting measurement limited to 100 steps since I think in 70 steps for one hidden layer the accuracy is already reaching 98%.

Below is how I do the comparison,

epochs = 1

steps = 0

running_loss = 0

print_every = 5

for epoch in range(epochs):

for inputs, labels in trainloader:

if steps > 100:

break

steps += 1

start = time.time()

# Move input and label tensors to the default device

inputs, labels = inputs.to(device), labels.to(device)

optimizer.zero_grad()

logps = model.forward(inputs)

loss = criterion(logps, labels)

loss.backward()

optimizer.step()

running_loss += loss.item()

if steps % print_every == 0:

test_loss = 0

accuracy = 0

model.eval()

with torch.no_grad():

for inputs, labels in testloader:

inputs, labels = inputs.to(device), labels.to(device)

logps = model.forward(inputs)

batch_loss = criterion(logps, labels)

test_loss += batch_loss.item()

# Calculate accuracy

ps = torch.exp(logps)

top_p, top_class = ps.topk(1, dim=1)

equals = top_class == labels.view(*top_class.shape)

accuracy += torch.mean(equals.type(torch.FloatTensor)).item()

print(f"Epoch {epoch+1}/{epochs}.. "

f"Step {steps}.. "

f"Time per batch: {(time.time() - start)/print_every:.3f} seconds.. "

f"Train loss: {running_loss/print_every:.3f}.. "

f"Test loss: {test_loss/len(testloader):.3f}.. "

f"Test accuracy: {accuracy/len(testloader):.3f}")

running_loss = 0

model.train()

But after several testing, it seems 100 is not enough. I am still thinking probably the batch size and the dropout do have some effect. Below is the test result for one hidden layer. I will try again tomorrow with more test. The time to produce the test is quite long so I could not finish by today.

Epoch 1/1.. Step 5.. Time per batch: 9.512 seconds.. Train loss: 0.201.. Test loss: 0.074.. Test accuracy: 0.970
Epoch 1/1.. Step 10.. Time per batch: 9.170 seconds.. Train loss: 0.208.. Test loss: 0.063.. Test accuracy: 0.976
Epoch 1/1.. Step 15.. Time per batch: 9.504 seconds.. Train loss: 0.152.. Test loss: 0.063.. Test accuracy: 0.981
Epoch 1/1.. Step 20.. Time per batch: 9.168 seconds.. Train loss: 0.261.. Test loss: 0.060.. Test accuracy: 0.981
Epoch 1/1.. Step 25.. Time per batch: 9.430 seconds.. Train loss: 0.131.. Test loss: 0.095.. Test accuracy: 0.964
Epoch 1/1.. Step 30.. Time per batch: 9.150 seconds.. Train loss: 0.177.. Test loss: 0.063.. Test accuracy: 0.977
Epoch 1/1.. Step 35.. Time per batch: 9.506 seconds.. Train loss: 0.186.. Test loss: 0.057.. Test accuracy: 0.981
Epoch 1/1.. Step 40.. Time per batch: 9.197 seconds.. Train loss: 0.155.. Test loss: 0.059.. Test accuracy: 0.977
Epoch 1/1.. Step 45.. Time per batch: 9.525 seconds.. Train loss: 0.174.. Test loss: 0.066.. Test accuracy: 0.973
Epoch 1/1.. Step 50.. Time per batch: 9.175 seconds.. Train loss: 0.170.. Test loss: 0.069.. Test accuracy: 0.976
Epoch 1/1.. Step 55.. Time per batch: 9.504 seconds.. Train loss: 0.229.. Test loss: 0.072.. Test accuracy: 0.972
Epoch 1/1.. Step 60.. Time per batch: 9.197 seconds.. Train loss: 0.285.. Test loss: 0.052.. Test accuracy: 0.982
Epoch 1/1.. Step 65.. Time per batch: 9.494 seconds.. Train loss: 0.183.. Test loss: 0.064.. Test accuracy: 0.981
Epoch 1/1.. Step 70.. Time per batch: 9.188 seconds.. Train loss: 0.145.. Test loss: 0.053.. Test accuracy: 0.984
Epoch 1/1.. Step 75.. Time per batch: 9.528 seconds.. Train loss: 0.152.. Test loss: 0.049.. Test accuracy: 0.982
Epoch 1/1.. Step 80.. Time per batch: 9.188 seconds.. Train loss: 0.213.. Test loss: 0.052.. Test accuracy: 0.984
Epoch 1/1.. Step 85.. Time per batch: 9.508 seconds.. Train loss: 0.226.. Test loss: 0.065.. Test accuracy: 0.975
Epoch 1/1.. Step 90.. Time per batch: 9.162 seconds.. Train loss: 0.186.. Test loss: 0.047.. Test accuracy: 0.985
Epoch 1/1.. Step 95.. Time per batch: 9.542 seconds.. Train loss: 0.138.. Test loss: 0.046.. Test accuracy: 0.985
Epoch 1/1.. Step 100.. Time per batch: 9.170 seconds.. Train loss: 0.196.. Test loss: 0.046.. Test accuracy: 0.984

Search This Blog

60 days of udacity

Day 5: Transfer learning - impact of the number of hidden layers 1

Comments

Post a Comment

Popular posts from this blog

Day 38: fast.ai Lesson 3 - Performance, validation and model interpretation revisited I

Day 37: fast.ai Lesson 2 - Random forest deep dive revisited

Day 35: fast.ai Lesson 7 - RF from scratch and gradient descent II