Ariel University

Course name: Deep learning and natural language processing

Course number: 7061510-1

Lecturer: Dr. Amos Azaria

Edited by: Moshe Hanukoglu

Date: First Semester 2018-2019

Convolutional Neural Networks (CNN)

Introduction

Until now, we have dealt with fully-connected networks, meaning that each neuron is connected to each neuron in the next layer. Due to this structure of the network it is necessary to create a lot of weights and creating a lot of weights makes it very difficult calculations

In this chapter we will focus on a network of Convolational Neural Networks (CNN). This network can learn the right features to look for in data. This network is very useful in image processing.

^ Contents

Description of the network structure

Each layer in the network has a kernel that consists of one or more matrixes. Each matrix is a filter for another feature in the input for that layer.

Each cell in each matrices is the weight the network will learn (similar to weights in fully-connected) to get the correct result at the end.

The grid receives a matrix in the input, and moves the kernel over the entire matrix from the upper left corner to the lower right corner. In each shift, it calculates the sum of the kernel's product values with the matrix values where the kernel is.

For example:

credit

When the numbers in each matrix can be any number.

In the calculation above there is one input matrix (the big matrix) and the kernel is also built from one matrix.
There may be several matrices in the kernel, in which case we will perform the action described above for each of the kernel matrices.

Note: It is not necessary for the kernel to move in one square at a time; it can move in a few squares.
Sometimes a bias is added to each of the cells in each of the matrices or the matrices are inserted into the activation function.

^ Contents

Representing the size of the kernel

When we want to represent the size of the kernel we will write it as follows [Height, width, depth, number of filters].

The depth parameter can not be decided by the programmer but is determined by the dimension of the input.


In the case shown above the representation shall be [3,3,1,1]

In the case shown below the representation will be [3,3,3,1]

We can determine the stride of the sliding window for each dimension of input that is, the sliding window can skip on two cell in each move.

Padding

Increase the matrix to the size you want. This helps the output matrix to be larger (helps in some cases).

  • Same - want to pad
  • Valid - do not want to pad

^ Contents

n-D Convolution Weights

In practice, the input to the convolution layer is many times 3-dimensions. In that case, in every kernel group, each kernel multiplies a different dimension and the results are summed up.

credit

^ Contents

Max-Pooling

This method reduces the dimensions of the input matrix. This method is also used as an activation function and it also reduces the chances of over-fitting.

The way it works is to divide the matrix into equal parts and take the maximum from each part.

For example:

credit

^ Contents

Dropout (Regularization)

Given a network in the form of a fully-connected with weights on the arches, then in every few rounds determine the output of some neurons 0, after a few more rounds decide that the output of some other neurons is 0 and continue.
When the other neurons whose output is not equal to 0 receive a higher weight (for example, their weights are multiplied by only 2 for those rounds).

There are a number of explanations why layers help dorpout:

  • Random "shutdown" of neurons makes it difficult for the network to reach overfitting, because each time it sees the information flow from the input in a different way.
  • Requires the model to train all the weights of all neurons (not to reach 0) because each time only some of the neurons work and therefore they should be used as backup for neurons that do not work. And this may improve overall performance.
  • When we turn off a part of neurons is similar to training of models in different forms which gives a more general model

^ Contents

Identifying Numbers Handwritten By CNN

Network for MNIST handwritten number recognition.

In [1]:
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_datamnist = input_data.read_data_sets("MNIST_data/", one_hot=True)x = tf.placeholder(tf.float32, shape=[None, 784])
y_ = tf.placeholder(tf.float32, shape=[None, 10])
WARNING:tensorflow:From <ipython-input-1-06102e5dc187>:4: read_data_sets (from tensorflow.contrib.learn.python.learn.datasets.mnist) is deprecated and will be removed in a future version.
Instructions for updating:
Please use alternatives such as official/mnist/dataset.py from tensorflow/models.
WARNING:tensorflow:From /usr/local/lib/python3.5/dist-packages/tensorflow/contrib/learn/python/learn/datasets/mnist.py:260: maybe_download (from tensorflow.contrib.learn.python.learn.datasets.base) is deprecated and will be removed in a future version.
Instructions for updating:
Please write your own downloading logic.
WARNING:tensorflow:From /usr/local/lib/python3.5/dist-packages/tensorflow/contrib/learn/python/learn/datasets/base.py:252: _internal_retry.<locals>.wrap.<locals>.wrapped_fn (from tensorflow.contrib.learn.python.learn.datasets.base) is deprecated and will be removed in a future version.
Instructions for updating:
Please use urllib or similar directly.
Successfully downloaded train-images-idx3-ubyte.gz 9912422 bytes.
WARNING:tensorflow:From /usr/local/lib/python3.5/dist-packages/tensorflow/contrib/learn/python/learn/datasets/mnist.py:262: extract_images (from tensorflow.contrib.learn.python.learn.datasets.mnist) is deprecated and will be removed in a future version.
Instructions for updating:
Please use tf.data to implement this functionality.
Extracting MNIST_data/train-images-idx3-ubyte.gz
Successfully downloaded train-labels-idx1-ubyte.gz 28881 bytes.
WARNING:tensorflow:From /usr/local/lib/python3.5/dist-packages/tensorflow/contrib/learn/python/learn/datasets/mnist.py:267: extract_labels (from tensorflow.contrib.learn.python.learn.datasets.mnist) is deprecated and will be removed in a future version.
Instructions for updating:
Please use tf.data to implement this functionality.
Extracting MNIST_data/train-labels-idx1-ubyte.gz
WARNING:tensorflow:From /usr/local/lib/python3.5/dist-packages/tensorflow/contrib/learn/python/learn/datasets/mnist.py:110: dense_to_one_hot (from tensorflow.contrib.learn.python.learn.datasets.mnist) is deprecated and will be removed in a future version.
Instructions for updating:
Please use tf.one_hot on tensors.
Successfully downloaded t10k-images-idx3-ubyte.gz 1648877 bytes.
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Successfully downloaded t10k-labels-idx1-ubyte.gz 4542 bytes.
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz
WARNING:tensorflow:From /usr/local/lib/python3.5/dist-packages/tensorflow/contrib/learn/python/learn/datasets/mnist.py:290: DataSet.__init__ (from tensorflow.contrib.learn.python.learn.datasets.mnist) is deprecated and will be removed in a future version.
Instructions for updating:
Please use alternatives such as official/mnist/dataset.py from tensorflow/models.

Create an array with dimensions [5, 5, 1, 32] and fill it with numbers about 0 with a standard deviation of 0.1.
That is, in the kernel has 32 filters with size 5 x 5 x 1.

In [2]:
W_conv1 = tf.Variable(tf.truncated_normal([5, 5, 1, 32], stddev=0.1))
b_conv1 = tf.Variable(tf.constant(0.1, shape=[32]))

Reshape the array of x from a size of None x 784 to size of None x 28 x 28 x 1. If we had RGB image, we would have 3 channels that is, None x 28 x 28 x 3.

In [3]:
x_image = tf.reshape(x, [-1,28,28,1])

Now, run the convolution layer with the images, the kernel, and an array that describe the stride of the sliding window of weights for each dimension of the input and we want padding.

After the ReLU, running max_pool the max_pool gets the output of the convolution layer, previous layer, the size of the window for each dimension of the input tensor as a list, the stride of the sliding window for each dimension of the input tensor as a list and we want padding.

In [4]:
h_conv1 = tf.nn.relu(tf.nn.conv2d(x_image, W_conv1, strides=[1, 1, 1, 1], padding='SAME') + b_conv1)
h_pool1 = tf.nn.max_pool(h_conv1, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')
In [ ]:
W_conv2 = tf.Variable(tf.truncated_normal([5, 5, 32, 64], stddev=0.1))
b_conv2 = tf.Variable(tf.constant(0.1, shape=[64]))h_conv2 = tf.nn.relu(tf.nn.conv2d(h_pool1, W_conv2, strides=[1, 1, 1, 1], padding='SAME') + b_conv2)
h_pool2 = tf.nn.max_pool(h_conv2, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')h_pool2_flat = tf.reshape(h_pool2, [-1, 7*7*64])
W_fc1 = tf.Variable(tf.truncated_normal([7 * 7 * 64, 1024], stddev=0.1))
b_fc1 = tf.Variable(tf.constant(0.1, shape=[1024]))h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1)keep_prob = tf.placeholder(tf.float32)
h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)W_fc2 = tf.Variable(tf.truncated_normal([1024, 10], stddev=0.1))
b_fc2 = tf.Variable(tf.constant(0.1, shape=[10]))y_conv=tf.nn.softmax(tf.matmul(h_fc1_drop, W_fc2) + b_fc2)cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y_conv), reduction_indices=[1]))
train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy) #uses moving averages momentum
correct_prediction = tf.equal(tf.argmax(y_conv,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
sess = tf.InteractiveSession()
sess.run(tf.global_variables_initializer())for i in range(10000):
    batch = mnist.train.next_batch(50)
    if i%100 == 0:
        train_accuracy = accuracy.eval(feed_dict={x:batch[0], y_: batch[1], keep_prob: 1.0})
        print("step %d, training accuracy %g"%(i, train_accuracy))
    train_step.run(feed_dict={x: batch[0], y_: batch[1], keep_prob: 0.5})print("test accuracy %g"%accuracy.eval(feed_dict={x: mnist.test.images, y_: mnist.test.labels, keep_prob: 1.0}))
step 0, training accuracy 0.14
step 100, training accuracy 0.86
step 200, training accuracy 0.94
step 300, training accuracy 0.98
step 400, training accuracy 0.94
step 500, training accuracy 0.96
step 600, training accuracy 1
step 700, training accuracy 0.96
step 800, training accuracy 0.94
step 900, training accuracy 0.98
step 1000, training accuracy 1
step 1100, training accuracy 0.94
step 1200, training accuracy 0.92
step 1300, training accuracy 1
step 1400, training accuracy 1
step 1500, training accuracy 1
step 1600, training accuracy 0.92
step 1700, training accuracy 0.96
step 1800, training accuracy 1
step 1900, training accuracy 0.96
step 2000, training accuracy 0.94
step 2100, training accuracy 0.98
step 2200, training accuracy 0.96
step 2300, training accuracy 0.96
step 2400, training accuracy 0.96
step 2500, training accuracy 0.96
step 2600, training accuracy 1
step 2700, training accuracy 1
step 2800, training accuracy 0.96
step 2900, training accuracy 0.98
step 3000, training accuracy 0.98
step 3100, training accuracy 0.96
step 3200, training accuracy 1
step 3300, training accuracy 0.96
step 3400, training accuracy 0.98
step 3500, training accuracy 0.98
step 3600, training accuracy 0.98
step 3700, training accuracy 1
step 3800, training accuracy 1
step 3900, training accuracy 1
step 4000, training accuracy 0.98
step 4100, training accuracy 0.96
step 4200, training accuracy 0.98
step 4300, training accuracy 1
step 4400, training accuracy 1
step 4500, training accuracy 0.98
step 4600, training accuracy 1
step 4700, training accuracy 1
step 4800, training accuracy 1
step 4900, training accuracy 0.98
step 5000, training accuracy 0.96
step 5100, training accuracy 1
step 5200, training accuracy 1
step 5300, training accuracy 1
step 5400, training accuracy 1
step 5500, training accuracy 1
step 5600, training accuracy 1
step 5700, training accuracy 1
step 5800, training accuracy 1
step 5900, training accuracy 1
step 6000, training accuracy 0.98
step 6100, training accuracy 0.98
step 6200, training accuracy 0.98
step 6300, training accuracy 1
step 6400, training accuracy 1
step 6500, training accuracy 1
step 6600, training accuracy 1
step 6700, training accuracy 1
step 6800, training accuracy 0.98
step 6900, training accuracy 1
step 7000, training accuracy 1
step 7100, training accuracy 1
step 7200, training accuracy 0.98
step 7300, training accuracy 1
step 7400, training accuracy 0.98
step 7500, training accuracy 1
step 7600, training accuracy 1
step 7700, training accuracy 1
step 7800, training accuracy 1
step 7900, training accuracy 1
step 8000, training accuracy 1
step 8100, training accuracy 0.98
step 8200, training accuracy 1
step 8300, training accuracy 0.98
step 8400, training accuracy 0.98
step 8500, training accuracy 0.98
step 8600, training accuracy 1
step 8700, training accuracy 1
step 8800, training accuracy 1
step 8900, training accuracy 1
step 9000, training accuracy 1
step 9100, training accuracy 0.98
step 9200, training accuracy 0.98
step 9300, training accuracy 0.98
step 9400, training accuracy 1
step 9500, training accuracy 1
step 9600, training accuracy 1
step 9700, training accuracy 1
step 9800, training accuracy 1
step 9900, training accuracy 0.98

The structure of the network is in this order:

Reshape the x to [None, 28,28,1].

  1. Convolution layer (ReLU), kernel [5, 5, 1, 32], input [None,28,28,1].
  2. Max Pool layer, input from convolution layer.
  3. Convolution layer (ReLU), kernel [5, 5, 32, 64], input from Max Pool layer.
  4. Max Pool layer, input from convolution layer.

Reshape the x to [None, 7764]

  1. Fully-connected layer (ReLU), 1024 neurons, input from Max Pool layer.
  2. Dropout layer, input from the fully-connected layer.
  3. Softmax layer, input from the dropout layer.

^ Contents

tf.layer

Instead of manually stating all the Ws bs etc., we can use contrib.layers:

  • Convolution layer
    tf.layers.conv2d(inputs = conv1, filters = 32, kernel_size = [4,4], strides = [1,1], padding = 'VALID', activation = tf.nn.relu)
    
  • Fully-connected layer
    tf.layers.dense(last_layer, units = 10, activation=tf.nn.relu)