## I. Objective:

We use different equations depending on the number of output classes. With 2 classes, we will use binomial cross entropy for loss and more than 2 classes involves using the cross-entropy with softmax.

## II. Multinomial cross entropy:

Example of how to calculate the cross-entropy loss for a 3 class problem.

## III. Backpropagation:

With the multinomial cross entropy, you can see that we only keep the loss contribution from the correct class. Usually, with neural nets, this will be case if our ouputs are sparse (just 1 true class). Therefore, we can rewrite our loss into just a sum(-log(y_hat)) where y_hat will just be the probability of the correct class. We just replace y_i (true y) with 1 and for the probabilities for the other classes, doesn’t matter because their y_i is 0. This is referred to as negative log likelihood.

**Note: **The y^i below is just the probability for the correct class, so the loss is only affected by the probability of the correct class. However, the gradients DO take into account the probabilities for the other classes. This allows us to change all the weights towards minimizing the loss function. Also all the ys below are y_hats.

## IV. Softmax Classifier Implementation:

Naive Implementation:

loss = 0.0 dW = np.zeros_like(W) num_train = X.shape[0] num_classes = W.shape[1] for i in xrange(num_train): scores = X[i, :].dot(W) scores -= np.max(scores) normalized_scores = np.exp(scores) / np.sum(np.exp(scores)) for j in xrange(num_classes): if j == y[i]: loss += -np.log( normalized_scores[j] ) dW[:, j] += (normalized_scores[j] - 1.0) * X[i, :] else: dW[:, j] += (normalized_scores[j] - 0.0) * X[i, :] loss /= num_train dW /= num_train loss += 0.5 * reg * np.sum(W * W) dW += reg * W

Vectorized Implementation:

loss = 0.0 dW = np.zeros_like(W) num_train = X.shape[0] num_classes = W.shape[1] scores = np.exp(X.dot(W)) normalized_scores = (scores.T / np.sum(scores, axis=1)).T # division is only by axis=0 so we have to use transpose trick ground_truth = np.zeros_like(normalized_scores) ground_truth[range(num_train), y] = 1.0 # correct class loss = np.sum(np.sum(-np.log(normalized_scores[range(num_train), y]))) dW = X.T.dot(normalized_scores - ground_truth) loss /= num_train dW /= num_train loss += 0.5 * reg * np.sum(W * W) dW += reg * W

## V. Code Breakdown:

We will be using **load_data()** to load the MNIST data if needed and separate into train and test sets.

import tensorflow as tf import numpy as np import input_data class parameters(): def __init__(self): self.LEARNING_RATE = 0.05 self.NUM_EPOCHS = 500 self.DISPLAY_STEP = 1 # epoch def load_data(): mnist = input_data.read_data_sets("MNIST_data/", one_hot=True) trainX, trainY, testX, testY = mnist.train.images, mnist.train.labels, mnist.test.images, mnist.test.labels return trainX, trainY, testX, testY

We will create a model and use softmax cross-entropy with logits as our loss function. And **step()** will do the training/validation steps for us. Not the **forward_only** argument. When we are validating on the test set, we do not want to train on this set, so we will not do any operations involving the optimizer.

def create_model(sess, learning_rate): tf_model = model(learning_rate) sess.run(tf.initialize_all_variables()) return tf_model class model(object): def __init__(self, learning_rate): # Placeholders self.X = tf.placeholder("float", [None, 784]) self.y = tf.placeholder("float", [None, 10]) # Weights with tf.variable_scope('weights'): W = tf.Variable(tf.random_normal([784, 10], stddev=0.01), "W") self.logits = tf.matmul(self.X, W) self.cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(self.logits, self.y)) self.optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(self.cost) # Prediction self.prediction = tf.argmax(self.logits, 1) def step(self, sess, batch_X, batch_y, forward_only=True): input_feed = {self.X: batch_X, self.y: batch_y} if not forward_only: output_feed = [self.prediction, self.cost, self.optimizer] else: output_feed = [self.cost] outputs = sess.run(output_feed, input_feed) if not forward_only: return outputs[0], outputs[1], outputs[2] else: return outputs[0]

We will train the model using the entire train/test batches at each step.

def train(FLAGS): with tf.Session() as sess: model = create_model(sess, FLAGS.LEARNING_RATE) trainX, trainY, testX, testY = load_data() for epoch_num in range(FLAGS.NUM_EPOCHS): prediction, training_loss, _ = model.step(sess, trainX, trainY, forward_only=False) # Display if epoch_num%FLAGS.DISPLAY_STEP == 0: print "EPOCH %i: \n Training loss: %.3f, Test loss: %.3f" % ( epoch_num, training_loss, model.step(sess, testX, testY, forward_only=True)) if __name__== '__main__': FLAGS = parameters() train(FLAGS)

## VI. Raw Code:

**GitHub Repo (**Updating all repos, will be back up soon!)