Overview
Image Classification Steps
- General Steps
the advantage of CNNs
MLPs && CNNs:Fully connected && Local connected
Local connected/Sparsely connected
Weights sharing
Basic Concept–Ng
the structure of CNNs
Autoencoders/Encoders-2-Decoders
The key point is to leverage this compressed representation
- Normal images Reconstructuin
Denoising Autoencoder
Decoder:1) Transpose Convolution; 2) Upsampling + Convolutions
Transfer Learning
Transfer learning involves taking a pre-trained neural network and adapting the neural network to a new, different data set.
Four main cases
Take a look here for more detail.
Coding part: Load the pre-trained model, modefied the model as you want, freeze specified parameters if necessary(
requires_grad
), specify optimiser if necessary1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17# Load the pretrained model from pytorch
vgg16 = torchvision.models.vgg16(pretrained=True)
# print out the model structure, see the picture below
print(vgg16)
print(vgg16.classifier[6].in_features)
# Freeze training for all "features" layers
for param in vgg16.features.parameters():
param.requires_grad = False
# create a new classifer
import torch.nn as nn
n_inputs = vgg16.classifier[6].in_features
# add last linear layer (n_inputs -> 5 flower classes)
# new layers automatically have requires_grad = True
last_layer = nn.Linear(n_inputs, len(classes))
vgg16.classifier[6] = last_layer
# specify optimizer (stochastic gradient descent) and learning rate = 0.001
optimizer = optim.SGD(vgg16.classifier.parameters(), lr=0.001)Style Transfer
Weight Initialization
Having good initial weights can place the neural network close to the optimal solution. This allows the neural network to come to the best solution quicker.
If every weight is the same(constant weights), all the neurons at each layer are producing the same output. This makes it hard to decide which weights to adjust.
Commonly, we can use Uniform Initialization、Normal Initialization
CNN中全连接层作用
Database
MNIST
- A hand-written digits dataset: clean.Centered, heavily pre-processed images
- Visualize the data: 28x28 pixels1
CIFAR-10
- Small color images that fall into one of ten classes:60000 images(32x32)
ImageNet
Data processsing
Data Normalization
- basic
Data flattened
- To input the data into MLPs(Multi-Layer Perceptrons),you need to convert a maticx to a vector;
Data Augmentation
To deal with:
Scale Invariance; Rotation Invariance; Translation Invariance.
1
2
3
4
5
6
7# convert data to a normalized torch.FloatTensor
transform = transforms.Compose([
transforms.RandomHorizontalFlip(), # randomly flip and rotate
transforms.RandomRotation(10),
transforms.ToTensor(), # Convert a PIL Image or numpy.ndarray to tensor.
transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
])
Coding Part
Basic
-
Pytorch is Python package that provides two high-level features:
- Tensor computation (like NumPy) with strong GPU acceleration
- Deep neural networks built on a tape-based autograd system
-
The
torchvision
package consists of pupular datasets、model architectures、and common image transformations for computer vision; Import necessary libraries for working with data and Pytorch
1
2
3
4
5
6import torch
import torch.nn as nn
import torch.nn.functional as F
import numpy as np
from torchvision import datasets
import torchvision.transforms as transforms
Load the Data
Common database:
torchvision.transforms.
、torchvision.datasets.
、torch.utils.data.Dataloader
1
2
3
4
5transform = transforms.ToTensor() or transforms.Compose([... ,...])
train_data = torchvision.datasets.MNIST(root='data', train=True, download=True, transform=transform)
test_data = torchvision.datasets.MNIST(root='data', train=False, download=True, transform=transform)
train_loader = torch.utils.data.Dataloader(train_data, batch_size=20, num_workers=0)
test_loader = torch.utils.data.Dataloader(test_data, batch_size=20, num_workers=0)You can define a new Imagedatas class to load data from a directory: like ImageFolder
Visualize the Data
Gray:
matplotlib.pyplot
、plt.figure()、plt.imshow()1
2
3
4
5
6
7
8
9
10
11import matplotlib.pyplot as plt
dataiter = iter(train_loader)
images, labels = dataiter.next()
images = images.numpy() # from torch to numpy
fig = plt.figure(figsize=(25,4))
for idx in np.arange(20):
ax = fig.add_subplot(2, 20/2, id+1, xticks=[], yticks=[])
ax.imshow(np.squeeze(images[idx], cmap='gray'))
ax.set_title(str(labels[idx].item))RGB
- Single image annotation
Define the Network Architecture
-
torch.nn.Linear(in_features, out_features, bias=True)
Input: (N, *, in_features)(N,∗,in_features) where \∗ means any number of additional dimensions.
-
[activation、dropout function][f]
Specify Loss Function and Optimizeroptim
- Common:just use existed class: like
nn.CrossEntropyLoss()
. You can also define your own Loss function, it is commonly defined as a class.1
2criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr = 0.01)
Train the Network
The steps for training/learning from a batch of data are described in the comments below:
1. Clear the gradients of all optimized variables. 2. Forward pass: compute predicted outputs by passing inputs to the model. 3. Calculate the loss. 4. Backward pass: compute gradient of the loss with respect to model parameters. 5. Perform a single optimization step(parameter update). 6. Updata averge training loss.
- Basic process:prep model for training
model.train()
、model.eval()
1
2
3
4
5
6
7
8
9
10
11
12
13for data, target in train_loader:
# clear the gradients of all optimized variables
optimizer.zero_grad()
# forward pass: compute predicted outputs by passing inputs to the model
output = model(data)
# calculate the loss
loss = criterion(output, target)
# backward pass: compute gradient of the loss with respect to model parameters
loss.backward()
# perform a single optimization step (parameter update)
optimizer.step()
# update running training loss
train_loss += loss.item()*data.size(0)