从小白开始入门python+tensorflow+cnn做人脸性别识别(一)

暖宝宝j 2017-08-30

展开全文

在写这篇文章时,楼主是一个刚刚考上研的学生,以前从没接触过机器学习,由于老师要求,开始接触cnn人脸性别识别（也是这个时候开始接触CSDN的）,现记录下我学习中的坑以及一点经验。

我的老师信奉任务学习法，所以我是没有任何Python、tensorflow和cnn的基础的情况下接触的这个任务。我的第一个代码是著名的MNIST手写字识别。当然，这其中的困难和无奈有过相同经验的同志应该有得体会。关于python的基础知识什么的我也就不卖弄了，网上的大神写的博文很好，也很多，我这篇主要就是记录我做这个任务中的一些问题。

在刚开始搭建网络时，我是借鉴的一个老师的mnist代码的部分，自己修改了一下，完成了自己的基本的网络和参数定义（小白一枚，表述不准确还请见谅）

[python] view plain copy

#占位符x：（输入数据）
xs = tf.placeholder(tf.float32, shape = [None, 92*112])
ys = tf.placeholder(tf.float32, shape = [None, 2])
x_image = tf.reshape(xs, [-1,112, 92, 1])
#get w
def weight_variable(shape):
initial = tf.truncated_normal(shape, stddev = 0.1)
return tf.Variable(initial)
#get bias
def biases_variable(shape):
initial = tf.constant(0.1, shape = shape)
return tf.Variable(initial)
#convolutional layer
def conv2d(x,w):
return tf.nn.conv2d(x, w, strides = [1, 1, 1, 1], padding = 'SAME')
#pooling layer ##pooling层模版大小为2x2，所以输出的长宽会变为输入的一半大小
def max_pool(x):
return tf.nn.max_pool(x, ksize = [1, 2, 2, 1], strides = [1, 2, 2, 1], padding = 'SAME')
# the first convolutional layer1
w_conv1 = weight_variable([5, 5, 1, 32])
b_conv1 = biases_variable([32])
h_conv1 = tf.nn.relu(conv2d(x_image, w_conv1) + b_conv1) # output size 112x92x32
h_pool1 = max_pool(h_conv1) # output size 56x46x32
#the second convolutional layer2 每个5x5的patch会得到64个特征
w_conv2 = weight_variable([5,5,32,64])
b_conv2 = biases_variable([64])
h_conv2 = tf.nn.relu(conv2d(h_pool1, w_conv2) + b_conv2) #output size 56x46x64
h_pool2 = max_pool(h_conv2) #output size 28x23x64
#the third converlutional layer3
w_conv3 = weight_variable([5,5,64,128])
b_conv3 = biases_variable([128])
h_conv3 = tf.nn.relu(conv2d(h_pool2, w_conv3) + b_conv3)
h_pool3 = max_pool(h_conv3) #output size 14x12x128
#全连接层
w_fc1 = weight_variable([14,12,128,1024])
b_fc1 = biases_variable([1024])
h_fc11 = tf.nn.relu(tf.nn.conv2d(h_pool3, w_fc1, strides=[1,1,1,1], padding='VALID') + b_fc1)
h_fc1=tf.reshape(h_fc11,[-1,1024])

（完整代码我在最后贴出）

在这个框架中，我的训练数据是800张112x92的人脸照片（男女各四百张），测试数据是大概1031张112x92的人脸照片（男的591张，别问我为啥测试数据比训练数据还多。。这是我在网上down的）。至于其他的，我也做了相应的备注，就这样，我的网络先是基本完成了，虽然层数不多。。

在这过程中我写点我遇到的各种问题：

首先就是处理输入图片的shape问题，这个问题花费了我好几天才解决，解决办法就是代码中描述的那样，reshape成一个（-1，112，92，1）的tensor，如果不明白可以看我的完整代码-.-

后来遇到了一个问题就是全连接层需要确定最后一层max_pool输出的size，因为我这个112x92的size通过这几个max_pool是不能整除的（第二层到第三层），后来还是请教的一个师姐，说是自己试验一下就好了（也怪我自己笨），解决了这个事。

完成了网络框架后我就开始处理我的输入图片的问题（也就是如何输入保存我的原图，以及如何使用他们进行训练和测试）。

处理代码如下

[python] view plain copy

#image_train 是训练数据
image_train = np.zeros((800,112,92))
for i in range(800):
# path = ' H:\Python\train_sample\'
m = str(i+1)
filename = "face" + m + '.bmp'
with tf.Session() as sess:
image_train [i] = img.imread(filename)
#image_test 测试数据
image_test = np.zeros((1031,112,92))
for i in range(1031):
m = str(i+1)
filename = r"H:\Python\test_sample\face" + m + '.bmp'
with tf.Session() as sess:
image_test[i] = img.imread(filename)

我这是请教的学姐，她所使用的方法，我这么用了一下，还挺好用的。就是把数据都读到一个numpy数组，第一维的数字就是数量，后面跟的是图片的size。

解决了图片的问题，接下来就是label的问题。由于我的数据都是前半部分是男脸，后半部分是女脸，所以我就直接自己做了一个label的numpy数组，没有选择读取其他的文本（也是懒得，不想学这方面知识了）

[python] view plain copy

#test label
label_test = np.ones((1031,2))
for i in range(591):
label_test[i,0] = 0
label_test[i,1] = 1
for i in range(591,1031):
label_test[i,0] = 1
label_test[i,1] = 0
#print(label_test[591][0])
#train label
label_train = np.ones((800,2))
for i in range(400):
label_train[i,0] = 0
label_train[i,1] = 1
for i in range(400,800):
label_train[i,0] = 1
label_train[i,1] = 0

在这过程中，我遇到过一个很大的问题，当时花了好长时间，各种请教才解决了。这或许就是没有系统学习的缺点吧。。

由于我的分类器选择的是softmax，而我当时选择的输出的类别数是1（在我想来，男的是1，女的是0），所以我刚开始制作的label是（1031，1）和（800，1），结果各种报错，我也很懵，就是找不到问题所在。。。这种问题估计也就是我这种小白才会遇到。

弄完了label，接下来就是损失函数和优化函数了，我这个是直接借鉴的一篇论文的，没什么好说的，直接上代码

[python] view plain copy

#训练及评估
cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=ys, logits=y_out))
#cross_entropy=tf.reduce_mean(-tf.reduce_sum(ys*tf.log(y_out),reduction_indices=[1]))
train_step = tf.train.GradientDescentOptimizer(0.001).minimize(cross_entropy)
correct_prediction = tf.equal(tf.argmax(y_out,1), tf.argmax(ys,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
sess.run(tf.global_variables_initializer())
#w1 = tf.zeros(shape = w_conv3.shape)
#w2 = tf.zeros(shape = w_conv3.shape)
#w3 = tf.zeros(shape = w_conv3.shape)
for i in range(1000):
train_accuracy = sess.run(accuracy,feed_dict={xs:image_train, ys:label_train, keep_prob:0.6})
print("step %d, training accuracy %g"%(i, train_accuracy))
sess.run(train_step,feed_dict={xs:image_train, ys:label_train, keep_prob:0.5})
print(sess.run(cross_entropy,feed_dict={xs:image_train, ys:label_train, keep_prob:0.5}))
test_accuracy = sess.run(accuracy,feed_dict={xs:image_test,ys:label_test,keep_prob:1})
print("step %d, testing accuracy %g"%(i, test_accuracy))

中间的那部分w1，w2，w3是我当时实验一个问题，没具体意义

这就基本完成了我的网络的所有部分，然后我就训练了10次先试试结果（实验的时候我的learning rate设置的是0.1），结果发现我的训练准确率和测试准确率都在训练两次后不变了，后来查了查才发现如果激活函数用的是relu的话，learning rate不能太大，否则就会出现一种情况，表现形式就是我这种，还会在中途出现kernal dead restarting（有可能拼写错误。。）详情可以看看这个前辈的博文http://blog.csdn.net/cyh_24/article/details/50593400，有各种激活函数的介绍。

我现在还在跑我的这个模型。。不过我估计准确率可能不会太高，首先我的网络层数太少，参数也不太会设置，后面我在改进后会继续记录的，如果有不对的地方还请指出。

完整代码

[python] view plain copy

# -*- coding: utf-8 -*-
"""
Created on Mon Aug 14 09:38:53 2017
@author: Administrator
"""
import matplotlib.image as img
import tensorflow as tf
import numpy as np
sess = tf.InteractiveSession()
#image_train 是训练数据
image_train = np.zeros((800,112,92))
for i in range(800):
# path = ' H:\Python\train_sample\'
m = str(i+1)
filename = "face" + m + '.bmp'
with tf.Session() as sess:
image_train [i] = img.imread(filename)
#image_test 测试数据
image_test = np.zeros((1031,112,92))
for i in range(1031):
m = str(i+1)
filename = r"H:\Python\test_sample\face" + m + '.bmp'
with tf.Session() as sess:
image_test[i] = img.imread(filename)
#test label
label_test = np.ones((1031,2))
for i in range(591):
label_test[i,0] = 0
label_test[i,1] = 1
for i in range(591,1031):
label_test[i,0] = 1
label_test[i,1] = 0
#print(label_test[591][0])
#train label
label_train = np.ones((800,2))
for i in range(400):
label_train[i,0] = 0
label_train[i,1] = 1
for i in range(400,800):
label_train[i,0] = 1
label_train[i,1] = 0
#print(label_train[700,0])
#print(label_train[700,1])
#print(label_train)
#占位符x：（输入数据）
xs = tf.placeholder(tf.float32, shape = [None, 112,92])
ys = tf.placeholder(tf.float32, shape = [None, 2])
keep_prob = tf.placeholder(tf.float32)
x_image = tf.reshape(xs, [-1,112, 92, 1])#.stype(tf.float32) #指定类型
#get w
def weight_variable(shape):
initial = tf.truncated_normal(shape, stddev = 0.1)
return tf.Variable(initial)
#get bias
def biases_variable(shape):
initial = tf.constant(0.1, shape = shape)
return tf.Variable(initial)
#convolutional layer
def conv2d(x,w):
return tf.nn.conv2d(x, w, strides = [1, 1, 1, 1], padding = 'SAME')
#pooling layer ##pooling层模版大小为2x2，所以输出的长宽会变为输入的一半大小
def max_pool(x):
return tf.nn.max_pool(x, ksize = [1, 2, 2, 1], strides = [1, 2, 2, 1], padding = 'SAME')
# the first convolutional layer1
w_conv1 = weight_variable([5, 5, 1, 32])
b_conv1 = biases_variable([32])
h_conv1 = tf.nn.relu(conv2d(x_image, w_conv1) + b_conv1) # output size 112x92x32
h_pool1 = max_pool(h_conv1) # output size 56x46x32
#the second convolutional layer2 每个5x5的patch会得到64个特征
w_conv2 = weight_variable([5,5,32,64])
b_conv2 = biases_variable([64])
h_conv2 = tf.nn.relu(conv2d(h_pool1, w_conv2) + b_conv2) #output size 56x46x64
h_pool2 = max_pool(h_conv2) #output size 28x23x64
#the third converlutional layer3
w_conv3 = weight_variable([5,5,64,128])
b_conv3 = biases_variable([128])
h_conv3 = tf.nn.relu(conv2d(h_pool2, w_conv3) + b_conv3)
h_pool3 = max_pool(h_conv3) #output size 14x12x128
#全连接层
w_fc1 = weight_variable([14,12,128,1024])
b_fc1 = biases_variable([1024])
h_fc11 = tf.nn.relu(tf.nn.conv2d(h_pool3, w_fc1, strides=[1,1,1,1], padding='VALID') + b_fc1)
h_fc1=tf.reshape(h_fc11,[-1,1024])
#防止过拟合
keep_prob = tf.placeholder(tf.float32)
h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)
#设置可变的学习率
#global_step = tf.Variable(0)
#learning_rate = tf.train.exponential_decay(0.001,global_step,100,0.98,staircase = True)
#softmax层
w_fc2 = weight_variable([1024, 2])
b_fc2 = biases_variable([2])
y_out = tf.nn.softmax(tf.matmul(h_fc1_drop, w_fc2) + b_fc2)
#print(y_out)
#训练及评估
cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=ys, logits=y_out))
#cross_entropy=tf.reduce_mean(-tf.reduce_sum(ys*tf.log(y_out),reduction_indices=[1]))
train_step = tf.train.GradientDescentOptimizer(0.001).minimize(cross_entropy)
correct_prediction = tf.equal(tf.argmax(y_out,1), tf.argmax(ys,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
sess.run(tf.global_variables_initializer())
#w1 = tf.zeros(shape = w_conv3.shape)
#w2 = tf.zeros(shape = w_conv3.shape)
#w3 = tf.zeros(shape = w_conv3.shape)
for i in range(1000):
train_accuracy = sess.run(accuracy,feed_dict={xs:image_train, ys:label_train, keep_prob:0.6})
print("step %d, training accuracy %g"%(i, train_accuracy))
sess.run(train_step,feed_dict={xs:image_train, ys:label_train, keep_prob:0.5})
print(sess.run(cross_entropy,feed_dict={xs:image_train, ys:label_train, keep_prob:0.5}))
test_accuracy = sess.run(accuracy,feed_dict={xs:image_test,ys:label_test,keep_prob:1})
print("step %d, testing accuracy %g"%(i, test_accuracy))