CNNs

Overview

Image Classification Steps

  • General Steps

the advantage of CNNs

  • MLPs && CNNs:Fully connected && Local connected

  • Local connected/Sparsely connected

  • Weights sharing

Basic Concept–Ng

kernel、Padding、Strid、Convolution、Pooling

the structure of CNNs

Autoencoders/Encoders-2-Decoders

The key point is to leverage this compressed representation

Transfer Learning

Transfer learning involves taking a pre-trained neural network and adapting the neural network to a new, different data set.

  • Four main cases

    Take a look here for more detail.

  • Coding part: Load the pre-trained model, modefied the model as you want, freeze specified parameters if necessary(requires_grad), specify optimiser if necessary

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    # Load the pretrained model from pytorch
    vgg16 = torchvision.models.vgg16(pretrained=True)
    # print out the model structure, see the picture below
    print(vgg16)
    print(vgg16.classifier[6].in_features)
    # Freeze training for all "features" layers
    for param in vgg16.features.parameters():
    param.requires_grad = False
    # create a new classifer
    import torch.nn as nn
    n_inputs = vgg16.classifier[6].in_features
    # add last linear layer (n_inputs -> 5 flower classes)
    # new layers automatically have requires_grad = True
    last_layer = nn.Linear(n_inputs, len(classes))
    vgg16.classifier[6] = last_layer
    # specify optimizer (stochastic gradient descent) and learning rate = 0.001
    optimizer = optim.SGD(vgg16.classifier.parameters(), lr=0.001)

    Style Transfer

Weight Initialization

Having good initial weights can place the neural network close to the optimal solution. This allows the neural network to come to the best solution quicker.

If every weight is the same(constant weights), all the neurons at each layer are producing the same output. This makes it hard to decide which weights to adjust.

Commonly, we can use Uniform InitializationNormal Initialization

CNN中全连接层作用

Database

MNIST

  • A hand-written digits dataset: clean.Centered, heavily pre-processed images
  • Visualize the data: 28x28 pixels1

CIFAR-10

  • Small color images that fall into one of ten classes:60000 images(32x32)

ImageNet

Data processsing

Data Normalization

  • basic

Data flattened

  • To input the data into MLPs(Multi-Layer Perceptrons),you need to convert a maticx to a vector;

Data Augmentation

  • To deal with:

    Scale Invariance; Rotation Invariance; Translation Invariance.

    1
    2
    3
    4
    5
    6
    7
    # convert data to a normalized torch.FloatTensor
    transform = transforms.Compose([
    transforms.RandomHorizontalFlip(), # randomly flip and rotate
    transforms.RandomRotation(10),
    transforms.ToTensor(), # Convert a PIL Image or numpy.ndarray to tensor.
    transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
    ])

Coding Part

Basic

  • Pytorch Package

    Pytorch is Python package that provides two high-level features:

    1. Tensor computation (like NumPy) with strong GPU acceleration
    2. Deep neural networks built on a tape-based autograd system
  • torchvision

    The torchvision package consists of pupular datasets、model architectures、and common image transformations for computer vision;

  • Import necessary libraries for working with data and Pytorch

    1
    2
    3
    4
    5
    6
    import torch
    import torch.nn as nn
    import torch.nn.functional as F
    import numpy as np
    from torchvision import datasets
    import torchvision.transforms as transforms

Load the Data

  • Common database: torchvision.transforms.torchvision.datasets.torch.utils.data.Dataloader

    1
    2
    3
    4
    5
    transform = transforms.ToTensor()  or  transforms.Compose([... ,...])
    train_data = torchvision.datasets.MNIST(root='data', train=True, download=True, transform=transform)
    test_data = torchvision.datasets.MNIST(root='data', train=False, download=True, transform=transform)
    train_loader = torch.utils.data.Dataloader(train_data, batch_size=20, num_workers=0)
    test_loader = torch.utils.data.Dataloader(test_data, batch_size=20, num_workers=0)
  • You can define a new Imagedatas class to load data from a directory: like ImageFolder

Visualize the Data

  • Gray:matplotlib.pyplotplt.figure()plt.imshow()

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    import matplotlib.pyplot as plt

    dataiter = iter(train_loader)
    images, labels = dataiter.next()
    images = images.numpy() # from torch to numpy

    fig = plt.figure(figsize=(25,4))
    for idx in np.arange(20):
    ax = fig.add_subplot(2, 20/2, id+1, xticks=[], yticks=[])
    ax.imshow(np.squeeze(images[idx], cmap='gray'))
    ax.set_title(str(labels[idx].item))
  • RGB

Define the Network Architecture

  • Linear layers

    torch.nn.Linear(in_features, out_features, bias=True)

    Input: (N, *, in_features)(N,∗,in_features) where \∗ means any number of additional dimensions.

  • Convolutional Layers

  • [activation、dropout function][f]

Specify Loss Function and Optimizeroptim

  • Common:just use existed class: like nn.CrossEntropyLoss(). You can also define your own Loss function, it is commonly defined as a class.
    1
    2
    criterion = nn.CrossEntropyLoss()
    optimizer = torch.optim.SGD(model.parameters(), lr = 0.01)

Train the Network

The steps for training/learning from a batch of data are described in the comments below:

1. Clear the gradients of all optimized variables.
2. Forward pass: compute predicted outputs by passing inputs to the model.
3. Calculate the loss.
4. Backward pass: compute gradient of the loss with respect to model parameters.
5. Perform a single optimization step(parameter update).
6. Updata averge training loss.
  • Basic process:prep model for training model.train()model.eval()
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    for data, target in train_loader:
    # clear the gradients of all optimized variables
    optimizer.zero_grad()
    # forward pass: compute predicted outputs by passing inputs to the model
    output = model(data)
    # calculate the loss
    loss = criterion(output, target)
    # backward pass: compute gradient of the loss with respect to model parameters
    loss.backward()
    # perform a single optimization step (parameter update)
    optimizer.step()
    # update running training loss
    train_loss += loss.item()*data.size(0)
------ 本文结束 ------