Overview

Image Classification Steps

General Steps

the advantage of CNNs

MLPs && CNNs：Fully connected && Local connected
Local connected/Sparsely connected
Weights sharing

Basic Concept–Ng

kernel、Padding、Strid、Convolution、Pooling

the structure of CNNs

Autoencoders/Encoders-2-Decoders

The key point is to leverage this compressed representation

Normal images Reconstructuin
Denoising Autoencoder
Decoder：1) Transpose Convolution; 2) Upsampling + Convolutions

Transfer Learning

Transfer learning involves taking a pre-trained neural network and adapting the neural network to a new, different data set.

Four main cases

Take a look here for more detail.

Coding part: Load the pre-trained model, modefied the model as you want, freeze specified parameters if necessary(requires_grad), specify optimiser if necessary

# Load the pretrained model from pytorch
vgg16 = torchvision.models.vgg16(pretrained=True)
# print out the model structure, see the picture below
print(vgg16)
print(vgg16.classifier[6].in_features) 
# Freeze training for all "features" layers
for param in vgg16.features.parameters():
   	param.requires_grad = False
# create a new classifer
import torch.nn as nn
n_inputs = vgg16.classifier[6].in_features
# add last linear layer (n_inputs -> 5 flower classes)
# new layers automatically have requires_grad = True
last_layer = nn.Linear(n_inputs, len(classes))
vgg16.classifier[6] = last_layer
# specify optimizer (stochastic gradient descent) and learning rate = 0.001
optimizer = optim.SGD(vgg16.classifier.parameters(), lr=0.001)

Style Transfer

Weight Initialization

Having good initial weights can place the neural network close to the optimal solution. This allows the neural network to come to the best solution quicker.

If every weight is the same(constant weights), all the neurons at each layer are producing the same output. This makes it hard to decide which weights to adjust.

Commonly, we can use Uniform Initialization、Normal Initialization

CNN中全连接层作用

全连接层作用

Database

MNIST

A hand-written digits dataset: clean.Centered, heavily pre-processed images
Visualize the data: 28x28 pixels1

CIFAR-10

Small color images that fall into one of ten classes:60000 images(32x32)

ImageNet

Data processsing

Data Normalization

basic

Data flattened

To input the data into MLPs(Multi-Layer Perceptrons),you need to convert a maticx to a vector;

Data Augmentation

To deal with:

Scale Invariance; Rotation Invariance; Translation Invariance.

# convert data to a normalized torch.FloatTensor
transform = transforms.Compose([
    transforms.RandomHorizontalFlip(), # randomly flip and rotate
    transforms.RandomRotation(10),
    transforms.ToTensor(),  # Convert a PIL Image or numpy.ndarray to tensor.
    transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
    ])

Coding Part

Basic

Pytorch Package
Pytorch is Python package that provides two high-level features:
1. Tensor computation (like NumPy) with strong GPU acceleration
2. Deep neural networks built on a tape-based autograd system
torchvision

The torchvision package consists of pupular datasets、model architectures、and common image transformations for computer vision;

Import necessary libraries for working with data and Pytorch

import torch
import torch.nn as nn
import torch.nn.functional as F
import numpy as np
from torchvision import datasets
import torchvision.transforms as transforms

Load the Data

Common database: torchvision.transforms.、torchvision.datasets.、torch.utils.data.Dataloader

transform = transforms.ToTensor()  or  transforms.Compose([... ,...])
train_data = torchvision.datasets.MNIST(root='data', train=True, download=True, transform=transform)
test_data = torchvision.datasets.MNIST(root='data', train=False, download=True, transform=transform)
train_loader = torch.utils.data.Dataloader(train_data, batch_size=20, num_workers=0)
test_loader = torch.utils.data.Dataloader(test_data, batch_size=20, num_workers=0)

You can define a new Imagedatas class to load data from a directory: like ImageFolder

Visualize the Data

Gray：matplotlib.pyplot、plt.figure()、plt.imshow()

import matplotlib.pyplot as plt

dataiter = iter(train_loader)
images, labels = dataiter.next()
images = images.numpy()		# from torch to numpy

fig = plt.figure(figsize=(25,4))
for idx in np.arange(20):
	ax = fig.add_subplot(2, 20/2, id+1, xticks=[], yticks=[])
	ax.imshow(np.squeeze(images[idx], cmap='gray'))
	ax.set_title(str(labels[idx].item))

Single image annotation

matplotlib.pyplot.annotate、Matplotlib 中文用户指南 4.5 标注

Define the Network Architecture

Linear layers

torch.nn.Linear(in_features, out_features, bias=True)

Input: (N, *, in_features)(N,∗,in_features) where \∗ means any number of additional dimensions.
Convolutional Layers
[activation、dropout function][f]

Specify Loss Function and Optimizeroptim

Common：just use existed class: like nn.CrossEntropyLoss(). You can also define your own Loss function, it is commonly defined as a class.
1
2
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr = 0.01)

Train the Network

The steps for training/learning from a batch of data are described in the comments below:

1. Clear the gradients of all optimized variables.
2. Forward pass: compute predicted outputs by passing inputs to the model.
3. Calculate the loss.
4. Backward pass: compute gradient of the loss with respect to model parameters.
5. Perform a single optimization step(parameter update).
6. Updata averge training loss.

Basic process：prep model for training model.train() 、model.eval()

for data, target in train_loader:
    # clear the gradients of all optimized variables
    optimizer.zero_grad()
    # forward pass: compute predicted outputs by passing inputs to the model
    output = model(data)
    # calculate the loss
    loss = criterion(output, target)
    # backward pass: compute gradient of the loss with respect to model parameters
    loss.backward()
    # perform a single optimization step (parameter update)
    optimizer.step()
    # update running training loss
    train_loss += loss.item()*data.size(0)