Logistic Regression

I. Objective:

We use different equations depending on the number of output classes. With 2 classes, we will use binomial cross entropy for loss and more than 2 classes involves using the cross-entropy with softmax.

Screen Shot 2016-11-06 at 8.13.13 PM.png

II. Multinomial cross entropy:

Example of how to calculate the cross-entropy loss for a 3 class problem.

Screen Shot 2016-11-06 at 8.13.41 PM.png

III. Backpropagation:

With the multinomial cross entropy, you can see that we only keep the loss contribution from the correct class. Usually, with neural nets, this will be case if our ouputs are sparse (just 1 true class). Therefore, we can rewrite our loss into just a sum(-log(y_hat)) where y_hat will just be the probability of the correct class. We just replace y_i (true y) with 1 and for the probabilities for the other classes, doesn’t matter because their y_i is 0. This is referred to as negative log likelihood.

Note: The y^i below is just the probability for the correct class, so the loss is only affected by the probability of the correct class. However, the gradients DO take into account the probabilities for the other classes. This allows us to change all the weights towards minimizing the loss function. Also all the ys below are y_hats.

Screen Shot 2016-10-29 at 10.40.34 AM.png

IV. Softmax Classifier Implementation:

Naive Implementation:

loss = 0.0
dW = np.zeros_like(W)
num_train = X.shape[0]
num_classes = W.shape[1]

for i in xrange(num_train):

	scores = X[i, :].dot(W)
    scores -= np.max(scores)
    normalized_scores = np.exp(scores) / np.sum(np.exp(scores))  

    for j in xrange(num_classes):

        if j == y[i]:
            loss += -np.log( normalized_scores[j] )
            dW[:, j] += (normalized_scores[j] - 1.0) * X[i, :]
        else:
            dW[:, j] += (normalized_scores[j] - 0.0) * X[i, :]

loss /= num_train
dW /= num_train
loss += 0.5 * reg * np.sum(W * W)
dW += reg * W

Vectorized Implementation:

loss = 0.0
dW = np.zeros_like(W)
num_train = X.shape[0]
num_classes = W.shape[1]

scores = np.exp(X.dot(W))
normalized_scores = (scores.T / np.sum(scores, axis=1)).T # division is only by axis=0 so we have to use transpose trick

ground_truth = np.zeros_like(normalized_scores)
ground_truth[range(num_train), y] = 1.0 # correct class

loss = np.sum(np.sum(-np.log(normalized_scores[range(num_train), y])))
dW = X.T.dot(normalized_scores - ground_truth)

loss /= num_train
dW /= num_train
loss += 0.5 * reg * np.sum(W * W)
dW += reg * W

 

V. Code Breakdown:

We will be using load_data() to load the MNIST data if needed and separate into train and test sets.

import tensorflow as tf
import numpy as np
import input_data

class parameters():

	def __init__(self):
		self.LEARNING_RATE = 0.05
		self.NUM_EPOCHS = 500
		self.DISPLAY_STEP = 1 # epoch

def load_data():
	mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)
	trainX, trainY, testX, testY = mnist.train.images, mnist.train.labels, mnist.test.images, mnist.test.labels
	return trainX, trainY, testX, testY

We will create a model and use softmax cross-entropy with logits as our loss function. And step() will do the training/validation steps for us. Not the forward_only argument. When we are validating on the test set, we do not want to train on this set, so we will not do any operations involving the optimizer.

def create_model(sess, learning_rate):
	tf_model = model(learning_rate)
	sess.run(tf.initialize_all_variables())
	return tf_model

class model(object):

	def __init__(self, learning_rate):

		# Placeholders
		self.X = tf.placeholder("float", [None, 784])
		self.y = tf.placeholder("float", [None, 10])

		# Weights
		with tf.variable_scope('weights'):
			W = tf.Variable(tf.random_normal([784, 10], stddev=0.01), "W")

		self.logits = tf.matmul(self.X, W)
		self.cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(self.logits, self.y))
		self.optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(self.cost)

		# Prediction
		self.prediction = tf.argmax(self.logits, 1)

	def step(self, sess, batch_X, batch_y, forward_only=True):

		input_feed = {self.X: batch_X, self.y: batch_y}
		if not forward_only:
			output_feed = [self.prediction,
						self.cost,
						self.optimizer]
		else:
			output_feed = [self.cost]

		outputs = sess.run(output_feed, input_feed)

		if not forward_only:
			return outputs[0], outputs[1], outputs[2]
		else:
			return outputs[0]

We will train the model using the entire train/test batches at each step.

def train(FLAGS):

	with tf.Session() as sess:

		model = create_model(sess, FLAGS.LEARNING_RATE)
		trainX, trainY, testX, testY = load_data()

		for epoch_num in range(FLAGS.NUM_EPOCHS):
			prediction, training_loss, _ = model.step(sess, trainX, trainY, forward_only=False)

			# Display
			if epoch_num%FLAGS.DISPLAY_STEP == 0:
				print "EPOCH %i: \n Training loss: %.3f, Test loss: %.3f" % (
					epoch_num, training_loss, model.step(sess, testX, testY, forward_only=True))

if __name__== '__main__':
	FLAGS = parameters()
	train(FLAGS)

 

VI. Raw Code:

GitHub Repo (Updating all repos, will be back up soon!)

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s