使用CNN（卷积神经网络）和OpenCV进行手势识别

taotao_2016 2019-03-23

展开全文

使用CNN（卷积神经网络）和OpenCV进行手势识别

要构建SLR（手语识别），我们需要三件事：

机器学习数据集
构建机器学习模型（我们将使用CNN）
应用模型平台（我们将使用OpenCV）

1）数据集

可以在此处下载手势数据集（https://www./datamunge/sign-language-mnist）。

我们的机器学习数据集包含24个（J和Z除外）American Sign Laguage字母表的许多图像。每个图像的大小为28x28像素，这意味着每个图像总共784个像素。

使用CNN（卷积神经网络）和OpenCV进行手势识别

加载机器学习数据集

要加载数据集，请使用以下Python代码：

import kerasimport numpy as npimport pandas as pdimport cv2from matplotlib import pyplot as pltfrom keras.models import Sequential from keras.layers import Conv2D,MaxPooling2D, Dense,Flatten, Dropoutfrom keras.datasets import mnist import matplotlib.pyplot as pltfrom keras.utils import np_utilsfrom keras.optimizers import SGDtrain = pd.read_csv('train.csv')test = pd.read_csv('test.csv')y_train = train['label'].valuesy_test = test['label'].valuesX_train = train.drop(['label'],axis=1)X_test = test.drop(['label'], axis=1)

使用CNN（卷积神经网络）和OpenCV进行手势识别

我们的数据集采用CSV（逗号分隔值）格式。train_X和test_X包含每个像素的值。train_Y和test_Y包含图像标签。您可以使用以下Python代码查看机器学习数据集：

display(X_train.info())display(X_test.info())display(X_train.head(n = 2))display(X_test.head(n = 2))

使用CNN（卷积神经网络）和OpenCV进行手势识别

预处理

train_X和test_X包含所有像素像素值的数组。我们从这些值创建了一个图像。我们的图像尺寸是28x28，因此我们必须将阵列分成28x28像素组。为此，我们将使用以下代码：

X_train = np.array(X_train.iloc[:,:])X_train = np.array([np.reshape(i, (28,28)) for i in X_train])X_test = np.array(X_test.iloc[:,:])X_test = np.array([np.reshape(i, (28,28)) for i in X_test])num_classes = 26y_train = np.array(y_train).reshape(-1)y_test = np.array(y_test).reshape(-1)y_train = np.eye(num_classes)[y_train]y_test = np.eye(num_classes)[y_test]X_train = X_train.reshape((27455, 28, 28, 1))X_test = X_test.reshape((7172, 28, 28, 1))

使用CNN（卷积神经网络）和OpenCV进行手势识别

现在我们可以使用这个数据集来训练我们的机器学习模型了。

2）建立和训练模型

我们将使用CNN（卷积神经网络）来识别字母。我们用keras。

机器学习模型的Python实现如下：

classifier = Sequential()classifier.add(Conv2D(filters=8, kernel_size=(3,3),strides=(1,1),padding='same',input_shape=(28,28,1),activation='relu', data_format='channels_last'))classifier.add(MaxPooling2D(pool_size=(2,2)))classifier.add(Conv2D(filters=16, kernel_size=(3,3),strides=(1,1),padding='same',activation='relu'))classifier.add(Dropout(0.5))classifier.add(MaxPooling2D(pool_size=(4,4)))classifier.add(Dense(128, activation='relu'))classifier.add(Flatten())classifier.add(Dense(26, activation='softmax'))

使用CNN（卷积神经网络）和OpenCV进行手势识别

我们的模型由Conv2D和MaxPooling层组成，然后是一些全连接层(Dense）。

第一个Conv2D（卷积）层采用（28,28,1）的输入图像。最后一个全连接层为我们提供了26个字母的输出。

我们正在使用第二个Conv2D层之后的Dropout来正则化我们的训练。

我们在最后一层使用softmax激活函数。

最后我们的模型看起来像这样：

使用CNN（卷积神经网络）和OpenCV进行手势识别

我们必须编译并拟合机器学习模型。为此，我们将使用如下Python代码：

classifier.compile(optimizer='SGD', loss='categorical_crossentropy', metrics=['accuracy'])classifier.fit(X_train, y_train, epochs=50, batch_size=100)

使用CNN（卷积神经网络）和OpenCV进行手势识别

我们正在使用SGD优化器来编译我们的模型。您也可以将时期减少到25。

最后要检查准确性：

accuracy = classifier.evaluate(x=X_test,y=y_test,batch_size=32)print('Accuracy: ',accuracy[1])

使用CNN（卷积神经网络）和OpenCV进行手势识别

要保存训练过的机器学习模型，我们可以使用：

classifier.save('CNNmodel.h5')

3）OpenCV

以下Python实现方法为示例，可以根据需要自己调整。

导入Python库并加载模型

import cv2import numpy as npfrom keras.models import load_modelfrom skimage.transform import resize, pyramid_reduceimport PILfrom PIL import Imagemodel = load_model('CNNmodel.h5')

使用CNN（卷积神经网络）和OpenCV进行手势识别

辅助函数

def crop_image(image, x, y, width, height): return image[y:y + height, x:x + width]def prediction(pred): if pred == 0: print('A') elif pred == 1: print('B') elif pred == 2: print('C') elif pred == 3: print('D') elif pred == 14: print('O') elif pred == 8: print('I') elif pred == 20: print('U') elif pred == 21: print('V') elif pred == 22: print('W') elif pred == 24: print('Y') elif pred == 11: print('L')def keras_process_image(img): image_x = 28 image_y = 28 img = cv2.resize(img, (1,28,28), interpolation = cv2.INTER_AREA) #img = get_square(img, 28) #img = np.reshape(img, (image_x, image_y)) return img

使用CNN（卷积神经网络）和OpenCV进行手势识别

预测

我们必须从输入图像预测字母。我们的模型将输出作为整数而不是字母，因为标签是以整数形式给出的（A为1，B为2，C为3，依此类推......）

def keras_predict(model, image): data = np.asarray( image, dtype='int32' )  pred_probab = model.predict(data)[0] pred_class = list(pred_probab).index(max(pred_probab)) return max(pred_probab), pred_class

使用CNN（卷积神经网络）和OpenCV进行手势识别

创建窗体

我们必须创建一个窗口来从我们的网络摄像头获取输入。我们作为输入的图像应该是28x28灰度图像。因为我们在28x28尺寸的图像上训练我们的模型。示例代码如下：

def main(): while True: cam_capture = cv2.VideoCapture(0) _, image_frame = cam_capture.read() # Select ROI im2 = crop_image(image_frame, 300,300,300,300) image_grayscale = cv2.cvtColor(im2, cv2.COLOR_BGR2GRAY) image_grayscale_blurred = cv2.GaussianBlur(image_grayscale, (15,15), 0) im3 = cv2.resize(image_grayscale_blurred, (28,28), interpolation = cv2.INTER_AREA) #ar = np.array(resized_img) #ar = resized_img.reshape(1,784) im4 = np.resize(im3, (28, 28, 1)) im5 = np.expand_dims(im4, axis=0) pred_probab, pred_class = keras_predict(model, im5) #print(pred_class, pred_probab) prediction(pred_class) # Display cropped image cv2.imshow('Image2',im2) #cv2.imshow('Image4',resized_img) cv2.imshow('Image3',image_grayscale_blurred) if cv2.waitKey(25) & 0xFF == ord('q'): cv2.destroyAllWindows() breakkeras_predict(model, np.zeros((1, 28, 28, 1), dtype=np.uint8))if __name__ == '__main__': main()cam_capture.release()cv2.destroyAllWindows()

使用CNN（卷积神经网络）和OpenCV进行手势识别

我们的机器学习模型准确度约为94％，因此它应该识别字母而没有任何问题。