Convolutional Neural Networks (CNN)

Note: A basic introduction to CNNs and Tensorflow implementation. If anyone wants naive numpy python implementation with backpropagation and beyond, check the GitHub repo.

I. Objective:

CNNs are traditionally used for processing data that can be convolved over which allows us to encode specific properties into the NN architecture.

Ex: To process a 32X32X3 image with a vanilla NN would require (32*32*3) weights just for the first hidden layer. This full connectivity is inefficient and would quickly lead to many parameters.

CNN architecture:Screen Shot 2016-10-02 at 3.51.08 PM.png

II. Filters:

Filters (kernels, weights, etc.) slide across our image or subsequent layer outputs in order to extract features.

Filter weights are randomly or empirically initialized and are learned to become feature detectors. The filters on the first convolutional layer tend to learn low level features (edges, etc.) and subsequent layers’ filters learn high level features.

III. Conv Layer Output:

Unlike FC layers, each neuron in the conv layer will only have local connectivity to a local region of the whole image.

Screen Shot 2016-10-02 at 3.54.13 PM.png

We can control the output dimensions by controlling zero-padding and stride.

IV. Parameter Sharing:

Each filter convolves on the input image and creates a 55 X 55 2D matrix. The K filters together create the output of the first conv layer which is 55 X 55 X K. Each of the 55 X 55 neurons for each depth slice of the output from the first conv layer use the same weights (11 X 11 X 3). The intuition here is that each filter represents a feature and it makes sense to apply the same filter for the entire image using the same weights.

Screen Shot 2016-10-02 at 3.55.06 PM.png

V. ReLU Units:

ReLU applied at each depth slice (2D output from input and filter i). ReLU performs better than sigmoid and tanh units due to less of an effect from vanishing gradients but they still have pitfalls. For a more comprehensive look at different non-linear units and advantages and disadvantages of each, check out this post. 

Screen Shot 2016-10-02 at 3.57.15 PM.png

VI. Pooling:

Pooling involves downsampling for each depth slice. Here we see max-pooling, but we can also use other methods (avg. pooling, L2-norm pooling, etc.)

Intuition is to decrease the size of our processing units because we are still able to capture enough of the information needed by pooling.

Screen Shot 2016-10-02 at 3.58.52 PM.pngMany current models are learning towards reducing or completely removing the use of pooling since it leads to loss of information. You will start to see clever architectures completely based on convolutional operations with smaller filter sizes.

VII. FC Layers:

The final layers of the CNN will be fully-connected (FC) layers, which will flatten the output from the CNN. The final layer will be a classifier that will solve our desired task.

Screen Shot 2016-10-02 at 3.59.17 PM.png


VIII. Backpropagation:

Backpropagation for the convolution and pooling is slightly convoluted but take a look at the naive python implementation while reading the math for better understanding.

Screen Shot 2016-10-02 at 4.00.09 PM.png

IX. Code Analysis:

Unlike the previous examples, we will not be making a CNN from scratch with just basic numpy but we will instead use tensorflow abstractions. I will upload my basic CNN with just numpy with complete backpropagation in a later post.

We will be using to load the data but this time keep in mind that X is of shape [N X28 X 28 X 1]. With this shape, we can apply our filters and convolve on the image. We will also be having our model @ ckpt_dir = “CNN_ckpt_dir” and restoring an old model if available.

Once again, we will be splitting our data into batches for processing.

import tensorflow as tf
import numpy as np
import input_data
import os

class parameters():

    def __init__(self):
        self.batch_size = 128
        self.num_epochs = 10
        self.ckpt_dir = "CNN_ckpt_dir" # save models here

def load_data():
    # Load the data
    mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)
    trainX, trainY, testX, testY = mnist.train.images, mnist.train.labels, \
                             mnist.test.images, mnist.test.labels
    trainX = trainX.reshape(-1, 28, 28, 1)
    testX = testX.reshape(-1, 28, 28, 1)
    return trainX, trainY, testX, testY

def generate_batches(batch_size, X, y):

    # Create batches
    num_batches = len(X) // batch_size
    data_X = np.zeros([num_batches, batch_size, 28, 28, 1], dtype=np.float32)
    data_y = np.zeros([num_batches, batch_size, 10], dtype=np.float32)
    for batch_num in range(num_batches):
        data_X[batch_num] = X[batch_num*batch_size:(batch_num+1)*batch_size]
        data_y[batch_num] = y[batch_num*batch_size:(batch_num+1)*batch_size]
        yield data_X[batch_num], data_y[batch_num]

def generate_epochs(num_epochs, batch_size, X, y):

    for epoch_num in range(num_epochs):
        yield generate_batches(batch_size, X, y)

The CNN will have three conv layers with pooling followed by two FC layers. Follow how the shape of the input changes from feeding to output of the third conv layer.

def cnn_operations(X, w, w2, w3, w4, w_o,
                dropout_value_conv, dropout_value_hidden):

    l1a = tf.nn.relu(tf.nn.conv2d(X, w,
        strides=[1,1,1,1], padding='SAME'))
    l1 = tf.nn.max_pool(l1a, ksize=[1,2,2,1],
        strides=[1,2,2,1], padding='SAME')
    l1 = tf.nn.dropout(l1, dropout_value_conv)

    l2a = tf.nn.relu(tf.nn.conv2d(l1, w2,
        strides=[1,1,1,1], padding='SAME'))
    l2 = tf.nn.max_pool(l2a, ksize=[1,2,2,1],
        strides=[1,2,2,1], padding='SAME')
    l2 = tf.nn.dropout(l2, dropout_value_conv)

    l3a = tf.nn.relu(tf.nn.conv2d(l2, w3,
        strides=[1,1,1,1], padding='SAME'))
    l3 = tf.nn.max_pool(l3a, ksize=[1,2,2,1],
        strides=[1,2,2,1], padding='SAME')
    l3 = tf.reshape(l3,
        [-1, w4.get_shape().as_list()[0]]) # flatten to shape(?, 2048)
    l3 = tf.nn.dropout(l3, dropout_value_conv)

    l4 = tf.nn.relu(tf.matmul(l3, w4))
    l4 = tf.nn.dropout(l4, dropout_value_hidden)

    return tf.matmul(l4, w_o)

def init_weights(shape):
    return tf.Variable(tf.random_normal(shape, stddev=0.01))

class cnn_model(object):

    def __init__(self):

        # Placeholders
        self.X = tf.placeholder("float", [None, 28, 28, 1])
        self.y = tf.placeholder("float", [None, 10])
        self.dropout_value_conv = tf.placeholder("float")
        self.dropout_value_hidden = tf.placeholder("float")

        # Initalize weights
        w = init_weights([3, 3, 1, 32])       # 3x3x1 conv, 32 outputs
        w2 = init_weights([3, 3, 32, 64])     # 3x3x32 conv, 64 outputs
        w3 = init_weights([3, 3, 64, 128])    # 3x3x64 conv, 128 outputs
        w4 = init_weights([128 * 4 * 4, 625]) # FC 128 * 4 * 4 = 2048 inputs, 625 outputs
        w_o = init_weights([625, 10])         # FC 625 inputs, 10 outputs (labels)

        self.logits = cnn_operations(self.X, w, w2, w3, w4, w_o,
                    self.dropout_value_conv, self.dropout_value_hidden)
        self.cost = tf.reduce_mean(
                                    self.logits, self.y))
        self.optimizer = tf.train.RMSPropOptimizer(0.001, 0.9).minimize(

        # Accuracy
        self.correct_prediction = tf.equal(tf.argmax(self.y, 1),
                                    tf.argmax(self.logits, 1))
        self.accuracy = tf.reduce_mean(tf.cast(
                            self.correct_prediction, tf.float32))

        # Components for model saving
        self.global_step = tf.Variable(0, trainable=False)
        self.saver = tf.train.Saver(tf.all_variables())

    def step(self, sess, batch_X, batch_y,
        dropout_value_conv, dropout_value_hidden,

        input_feed = {self.X: batch_X, self.y: batch_y,
                self.dropout_value_conv: dropout_value_conv,
                self.dropout_value_hidden: dropout_value_hidden}

        if not forward_only:
            output_feed = [self.logits, self.cost,
                           self.accuracy, self.optimizer]
            output_feed = [self.cost, self.accuracy]

        outputs =, input_feed)

        if not forward_only:
            return outputs[0], outputs[1], outputs[2], outputs[3]
            return outputs[0], outputs[1]

def create_model(sess, FLAGS):

    model = cnn_model()

    ckpt = tf.train.get_checkpoint_state(FLAGS.ckpt_dir)
    if ckpt and tf.gfile.Exists(ckpt.model_checkpoint_path):
        print("Restoring old model parameters from %s" %
        model.saver.restore(sess, ckpt.model_checkpoint_path)
        print("Created new model.")

And of course, training is as usual with results for training and validation. Note that we now save our model at the end of each training epoch. When we use create_model() next time, we will be starting from the restored point.

def train(FLAGS):

    with tf.Session() as sess:

        model = create_model(sess, FLAGS)
        trainX, trainY, testX, testY = load_data()

        # Training
        for epoch_num, epoch in enumerate(
            train_cost = []
            train_accuracy = []
            print "Training in progress..."
            for batch_num, (input_X, labels_y) in enumerate(epoch):
                logits, cost, accuracy, _ = model.step(sess,
                                                       input_X, labels_y,

            print "Training:"
            print "Epoch: %i, batch: %i, cost: %.3f, accuarcy: %.3f" % (
                    epoch_num, batch_num,
                    np.mean(train_cost), np.mean(train_accuracy))

            # Validation
            for epoch_num, epoch in enumerate(generate_epochs(
                test_cost = []
                test_accuracy = []
                for batch_num, (input_X, labels_y) in enumerate(epoch):
                    cost, accuracy = model.step(sess,
                                                input_X, labels_y,

                print "Validation:"
                print "Epoch: %i, batch: %i, cost: %.3f, accuarcy: %.3f" % (
                    epoch_num, batch_num,
                    np.mean(test_cost), np.mean(test_accuracy))

            # Save checkpoint every epoch.
            if not os.path.isdir(FLAGS.ckpt_dir):
            checkpoint_path = os.path.join(FLAGS.ckpt_dir, "model.ckpt")
            print "Saving the model."
  , checkpoint_path,

if __name__ == '__main__':
    FLAGS = parameters()


X. Raw Code:

GitHub Repo (Updating all repos, will be back up soon!)

2 thoughts on “Convolutional Neural Networks (CNN)

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s