在本章中,我们将解决Kaggle的面部表情识别挑战。为了完成这项任务,我们将在训练数据上从头开始训练一个类似VGG的网络,同时考虑到我们的网络需要足够小和足够快才能在我们的CPU上实时运行。
人类的情绪是混合在一起的。在经历“惊喜”时,我们也可能会感到“快乐”(例如惊喜生日派对)或“害怕”(如果惊喜不是受欢迎的)。 即使在“害怕”的情绪中,我们也可能会感受到“愤怒”的暗示。
在研究情绪识别时,重要的是不要关注单个类别标签(就像我们有时在其他分类问题中所做的那样)。 相反,查看每种情绪的概率并表征分布对我们来说更有优势。 正如我们将在本章后面看到的那样,与简单地选择概率最高的单一情绪相比,检查情绪概率的分布为我们提供了更准确的情绪预测标准。
Kaggle Emotion and Facial Expression Recognition 挑战训练数据集包含 28,709 张图像,每张图像均为48*48张灰度图像(图)。面部已自动对齐,因此它们在每张图像中的大小大致相同。 鉴于这些图像,我们的目标是将每张脸上表达的情绪分为七个不同的类别:愤怒、厌恶、恐惧、快乐、悲伤、惊讶和中性。
这个面部表情数据集称为 FER13 数据集,可以在官方 Kaggle 比赛页面找到并下载。
Challenges in Representation Learning: Facial Expression Recognition Challenge | KaggleLearn facial expressions from an imagehttps://www.kaggle.com/c/challenges-in-representation-learning-facial-expression-recognition-challenge/data 也可以从百度网盘下载
链接:https://pan.baidu.com/s/1ODT8nfO9aGzfKkLrVVWjxg
提取码:ei3l
下载数据集后,您将找到一个名为 fer2013.csv 的文件,其中包含三列:
情感:类标签。
像素:一个由 48*48 = 2304个灰度像素组成的列表,代表人脸本身。
用法:图像是用于训练、PrivateTest(验证)还是 PublicTest(测试)。
我们的目标是现在获取这个 .csv 文件并将其转换为 HDF5 格式,以便我们可以更轻松地在其上训练卷积神经网络。
FER13 总共有七类:愤怒、厌恶、恐惧、快乐、悲伤、惊讶和中立。 然而,“厌恶”类存在严重的类不平衡,因为它只有 113 个图像样本(其余每个类有超过 1,000 个图像)。 在做了一些研究之后,遇到了 Mememoji 项目,它建议将“厌恶”和“愤怒”合并为一个类(因为情绪在视觉上相似),从而将 FER13 变成一个 6 类问题。
创建一个名为 emotion_config.py 的文件。存储配置变量的地方,包括输入数据集的路径、输出 HDF5 文件和批处理大小等等。
# import the necessary packages from os import path # define the base path to the emotion dataset BASE_PATH = "D:/Project/ml_toolset/emotion_recognition/raid/datasets/" BASE_PATH1 = "D:/Project/ml_toolset/emotion_recognition/" # use the base path to define the path to the input emotions file INPUT_PATH = path.sep.join([BASE_PATH, "fer2013/fer2013.csv"]) # define the number of classes (set to 6 if you are ignoring the # "disgust" class) # NUM_CLASSES = 7 NUM_CLASSES = 6 # define the path to the output training, validation, and testing # HDF5 files TRAIN_HDF5 = path.sep.join([BASE_PATH1, "hdf5/train.hdf5"]) VAL_HDF5 = path.sep.join([BASE_PATH1, "hdf5/val.hdf5"]) TEST_HDF5 = path.sep.join([BASE_PATH1, "hdf5/test.hdf5"]) # define the batch size BATCH_SIZE = 128 # define the path to where output logs will be stored OUTPUT_PATH = path.sep.join([BASE_PATH1, "output"]) CHECKPOINTS_PATH = path.sep.join([BASE_PATH1, "checkpoints"]) MODEL_PATH = path.sep.join([BASE_PATH1, "model"])
创一个名为build_dataset.py的文件。将负责摄取 fer2013.csv 数据集文件并输出一组 HDF5 文件; 分别用于训练、验证和测试拆分。
# import the necessary packages import emotion_config as config from customize.tools.hdf5DatasetWriter import HDF5DatasetWriter import numpy as np # open the input file for reading (skipping the header), then # initialize the list of data and labels for the training, # validation, and testing sets print("[INFO] loading input data...") f = open(config.INPUT_PATH) f.__next__() # f.next() for Python 2.7 (trainImages, trainLabels) = ([], []) (valImages, valLabels) = ([], []) (testImages, testLabels) = ([], []) # loop over the rows in the input file for row in f: # extract the label, image, and usage from the row (label, image, usage) = row.strip().split(",") label = int(label) # if we are ignoring the "disgust" class there will be 6 total # class labels instead of 7 if config.NUM_CLASSES == 6: # merge together the "anger" and "disgust classes if label == 1: label = 0 # if label has a value greater than zero, subtract one from # it to make all labels sequential (not required, but helps # when interpreting results) if label > 0: label -= 1 # reshape the flattened pixel list into a 48x48 (grayscale) # image image = np.array(image.split(" "), dtype="uint8") image = image.reshape((48, 48)) # check if we are examining a training image if usage == "Training": trainImages.append(image) trainLabels.append(label) # check if this is a validation image elif usage == "PrivateTest": valImages.append(image) valLabels.append(label) # otherwise, this must be a testing image else: testImages.append(image) testLabels.append(label) # construct a list pairing the training, validation, and testing # images along with their corresponding labels and output HDF5 # files datasets = [ (trainImages, trainLabels, config.TRAIN_HDF5), (valImages, valLabels, config.VAL_HDF5), (testImages, testLabels, config.TEST_HDF5)] # loop over the dataset tuples for (images, labels, outputPath) in datasets: # create HDF5 writer print("[INFO] building {}...".format(outputPath)) writer = HDF5DatasetWriter((len(images), 48, 48), outputPath) # loop over the image and add them to the dataset for (image, label) in zip(images, labels): writer.add([image], [label]) # close the HDF5 writer writer.close() # close the input file f.close()
python build_dataset.py命令完成执行后,您可以通过检查您指示将 HDF5 文件存储在 emotion_config.py 中的目录的内容来验证是否已生成 HDF5 文件。
要实施的识别各种情绪和面部表情的网络受到VGG网络的启发:
1. 网络中的 CONV 层将只有 3*3。
2. 我们将在网络中越深入,每个CONV层学习的过滤器数量将增加一倍。
创建一个名为emotionvggnet.py的文件。
# import the necessary packages from tensorflow.keras.models import Sequential from tensorflow.keras.layers import BatchNormalization from tensorflow.keras.layers import Conv2D from tensorflow.keras.layers import MaxPooling2D from tensorflow.keras.layers import ELU from tensorflow.keras.layers import Activation from tensorflow.keras.layers import Flatten from tensorflow.keras.layers import Dropout from tensorflow.keras.layers import Dense from tensorflow.keras import backend as K class EmotionVGGNet: @staticmethod def build(width, height, depth, classes): # initialize the model along with the input shape to be # "channels last" and the channels dimension itself model = Sequential() inputShape = (height, width, depth) chanDim = -1 # if we are using "channels first", update the input shape # and channels dimension if K.image_data_format() == "channels_first": inputShape = (depth, height, width) chanDim = 1 # Block #1: first CONV => RELU => CONV => RELU => POOL # layer set model.add(Conv2D(32, (3, 3), padding="same", kernel_initializer = "he_normal", input_shape = inputShape)) model.add(ELU()) model.add(BatchNormalization(axis=chanDim)) model.add(Conv2D(32, (3, 3), kernel_initializer="he_normal", padding = "same")) model.add(ELU()) model.add(BatchNormalization(axis=chanDim)) model.add(MaxPooling2D(pool_size=(2, 2))) model.add(Dropout(0.25)) # Block #2: second CONV => RELU => CONV => RELU => POOL # layer set model.add(Conv2D(64, (3, 3), kernel_initializer="he_normal", padding = "same")) model.add(ELU()) model.add(BatchNormalization(axis=chanDim)) model.add(Conv2D(64, (3, 3), kernel_initializer="he_normal", padding = "same")) model.add(ELU()) model.add(BatchNormalization(axis=chanDim)) model.add(MaxPooling2D(pool_size=(2, 2))) model.add(Dropout(0.25)) # Block #3: third CONV => RELU => CONV => RELU => POOL # layer set model.add(Conv2D(128, (3, 3), kernel_initializer="he_normal", padding = "same")) model.add(ELU()) model.add(BatchNormalization(axis=chanDim)) model.add(Conv2D(128, (3, 3), kernel_initializer="he_normal", padding = "same")) model.add(ELU()) model.add(BatchNormalization(axis=chanDim)) model.add(MaxPooling2D(pool_size=(2, 2))) model.add(Dropout(0.25)) # Block #4: first set of FC => RELU layers model.add(Flatten()) model.add(Dense(64, kernel_initializer="he_normal")) model.add(ELU()) model.add(BatchNormalization()) model.add(Dropout(0.5)) # Block #6: second set of FC => RELU layers model.add(Dense(64, kernel_initializer="he_normal")) model.add(ELU()) model.add(BatchNormalization()) model.add(Dropout(0.5)) # Block #7: softmax classifier model.add(Dense(classes, kernel_initializer="he_normal")) model.add(Activation("softmax")) # return the constructed network architecture return model
创建一个名为train_recognizer.py的文件。
# set the matplotlib backend so figures can be saved in the background import matplotlib matplotlib.use("Agg") # import the necessary packages import emotion_config as config from customize.tools.imagetoarraypreprocessor import ImageToArrayPreprocessor from customize.tools.epochcheckpoint import EpochCheckpoint from customize.tools.trainingmonitor import TrainingMonitor from customize.tools.hdf5datasetgenerator import HDF5DatasetGenerator from emotionvggnet import EmotionVGGNet from tensorflow.keras.preprocessing.image import ImageDataGenerator from tensorflow.keras.optimizers import Adam from tensorflow.keras.models import load_model import tensorflow.keras.backend as K import argparse import os # construct the argument parse and parse the arguments ap = argparse.ArgumentParser() ap.add_argument("-c", "--checkpoints", required=False, help="path to output checkpoint directory", default=config.CHECKPOINTS_PATH) ap.add_argument("-m", "--model", type=str, help="path to *specific* model checkpoint to load")#, default=config.MODEL_PATH ap.add_argument("-s", "--start-epoch", type=int, default=0, help="epoch to restart training at") args = vars(ap.parse_args()) # construct the training and testing image generators for data # augmentation, then initialize the image preprocessor trainAug = ImageDataGenerator(rotation_range=10, zoom_range=0.1, horizontal_flip=True, rescale=1 / 255.0, fill_mode="nearest") valAug = ImageDataGenerator(rescale=1 / 255.0) iap = ImageToArrayPreprocessor() # initialize the training and validation dataset generators trainGen = HDF5DatasetGenerator(config.TRAIN_HDF5, config.BATCH_SIZE, aug=trainAug, preprocessors=[iap], classes=config.NUM_CLASSES) valGen = HDF5DatasetGenerator(config.VAL_HDF5, config.BATCH_SIZE, aug=valAug, preprocessors=[iap], classes=config.NUM_CLASSES) # if there is no specific model checkpoint supplied, then initialize # the network and compile the model if args["model"] is None: print("[INFO] compiling model...") model = EmotionVGGNet.build(width=48, height=48, depth=1, classes=config.NUM_CLASSES) opt = Adam(lr=1e-3) model.compile(loss="categorical_crossentropy", optimizer=opt, metrics=["accuracy"]) # otherwise, load the checkpoint from disk else: print("[INFO] loading {}...".format(args["model"])) model = load_model(args["model"]) # update the learning rate print("[INFO] old learning rate: {}".format(K.get_value(model.optimizer.lr))) K.set_value(model.optimizer.lr, 1e-3) print("[INFO] new learning rate: {}".format(K.get_value(model.optimizer.lr))) # construct the set of callbacks figPath = os.path.sep.join([config.OUTPUT_PATH, "vggnet_emotion.png"]) jsonPath = os.path.sep.join([config.OUTPUT_PATH, "vggnet_emotion.json"]) callbacks = [ EpochCheckpoint(args["checkpoints"], every=5, startAt=args["start_epoch"]), TrainingMonitor(figPath, jsonPath=jsonPath, startAt=args["start_epoch"])] # train the network model.fit_generator( trainGen.generator(), steps_per_epoch=trainGen.numImages // config.BATCH_SIZE, validation_data=valGen.generator(), validation_steps=valGen.numImages // config.BATCH_SIZE, epochs=15, max_queue_size=config.BATCH_SIZE * 2, callbacks=callbacks, verbose=1) # close the databases trainGen.close() valGen.close()
(1)使用ELU,而不是ReLU。
(2)将“愤怒”和“厌恶”合并为一个标签。修改build_dataset.py 并将 NUM_CLASSES 设置为 6。
(3)使用 Adam 优化器,基本学习率为 1e-3。
当我们检查第 75 个epoch的输出时,我们现在看到 EmotionVGGNet 达到了 68.51% 的准确率。
创建一个名为test_recognizer.py的文件。
# import the necessary packages import emotion_config as config from customize.tools.imagetoarraypreprocessor import ImageToArrayPreprocessor from customize.tools.hdf5datasetgenerator import HDF5DatasetGenerator from tensorflow.keras.preprocessing.image import ImageDataGenerator from tensorflow.keras.models import load_model import argparse # construct the argument parse and parse the arguments ap = argparse.ArgumentParser() ap.add_argument("-m", "--model", type=str, help="path to model checkpoint to load") args = vars(ap.parse_args()) # initialize the testing data generator and image preprocessor testAug = ImageDataGenerator(rescale=1 / 255.0) iap = ImageToArrayPreprocessor() # initialize the testing dataset generator testGen = HDF5DatasetGenerator(config.TEST_HDF5, config.BATCH_SIZE, aug=testAug, preprocessors=[iap], classes=config.NUM_CLASSES) # load the model from disk print("[INFO] loading {}...".format(args["model"])) model = load_model(args["model"]) # evaluate the network (loss, acc) = model.evaluate_generator( testGen.generator(), steps=testGen.numImages // config.BATCH_SIZE, max_queue_size=config.BATCH_SIZE * 2) print("[INFO] accuracy: {:.2f}".format(acc * 100)) # close the testing database testGen.close()
要在 FER2013 上评估 EmotionVGGNet,只需打开一个终端并执行以下命令:
正如我的结果所表明的,我们能够在测试集上获得 66:96% 的准确率。这个66.96%的分类结果来自 FER2013 的6各类别的变体,而不是 Kaggle 识别挑战中的 7各类别的原始版本。但我们可以轻松地在7级版本上重新训练网络并获得类似的准确率。
创建emotion_detector.py文件。
--cascade 开关是我们人脸检测 Haar 级联的路径。
# import the necessary packages from tensorflow.keras.preprocessing.image import img_to_array from tensorflow.keras.models import load_model import numpy as np import argparse import imutils import cv2 # construct the argument parse and parse the arguments ap = argparse.ArgumentParser() ap.add_argument("-c", "--cascade", required=True, help="path to where the face cascade resides") ap.add_argument("-m", "--model", required=True, help="path to pre-trained emotion detector CNN") ap.add_argument("-v", "--video", help="path to the (optional) video file") args = vars(ap.parse_args()) # load the face detector cascade, emotion detection CNN, then define # the list of emotion labels detector = cv2.CascadeClassifier(args["cascade"]) model = load_model(args["model"]) EMOTIONS = ["angry", "scared", "happy", "sad", "surprised", "neutral"] # if a video path was not supplied, grab the reference to the webcam if not args.get("video", False): camera = cv2.VideoCapture(1) # otherwise, load the video else: camera = cv2.VideoCapture(args["video"]) # keep looping while True: # grab the current frame (grabbed, frame) = camera.read() # if we are viewing a video and we did not grab a # frame, then we have reached the end of the video if args.get("video") and not grabbed: break # resize the frame and convert it to grayscale frame = imutils.resize(frame, width=300) gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY) # initialize the canvas for the visualization, then clone # the frame so we can draw on it canvas = np.zeros((220, 300, 3), dtype="uint8") frameClone = frame.copy() # detect faces in the input frame, then clone the frame so that # we can draw on it rects = detector.detectMultiScale(gray, scaleFactor=1.1, minNeighbors=5, minSize=(30, 30), flags=cv2.CASCADE_SCALE_IMAGE) # ensure at least one face was found before continuing if len(rects) > 0: # determine the largest face area rect = sorted(rects, reverse=True, key = lambda x: (x[2] - x[0]) * (x[3] - x[1]))[0] (fX, fY, fW, fH) = rect # extract the face ROI from the image, then pre-process # it for the network roi = gray[fY:fY + fH, fX:fX + fW] roi = cv2.resize(roi, (48, 48)) roi = roi.astype("float") / 255.0 roi = img_to_array(roi) roi = np.expand_dims(roi, axis=0) # make a prediction on the ROI, then lookup the class # label preds = model.predict(roi)[0] label = EMOTIONS[preds.argmax()] # loop over the labels + probabilities and draw them for (i, (emotion, prob)) in enumerate(zip(EMOTIONS, preds)): # construct the label text text = "{}: {:.2f}%".format(emotion, prob * 100) # draw the label + probability bar on the canvas w = int(prob * 300) cv2.rectangle(canvas, (5, (i * 35) + 5), (w, (i * 35) + 35), (0, 0, 255), -1) cv2.putText(canvas, text, (10, (i * 35) + 23), cv2.FONT_HERSHEY_SIMPLEX, 0.45, (255, 255, 255), 2) # draw the label on the frame cv2.putText(frameClone, label, (fX, fY - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.45, (0, 0, 255), 2) cv2.rectangle(frameClone, (fX, fY), (fX + fW, fY + fH), (0, 0, 255), 2) # show our classifications + probabilities cv2.imshow("Face", frameClone) cv2.imshow("Probabilities", canvas) # if the ’q’ key is pressed, stop the loop if cv2.waitKey(1) & 0xFF == ord("q"): break # cleanup the camera and close any open windows camera.release() cv2.destroyAllWindows()
在本章中,我们学习了如何实现能够预测情绪和面部表情的卷积神经网络。我们训练了一个名为 EmotionVGGNet 的类VGG的CNN。该网络由两个相互堆叠的 CONV 层组成,每个块中的过滤器数量加倍。 重要的是我们的CNN:
1. 足够深以获得高精度。
2. 但不能深到无法在CPU上实时运行。
然后我们在 FER2013 数据集上训练了我们的 CNN,这是 Kaggle 情感和面部表情识别挑战的一部分。 总体而言,我们能够获得 66:96% 的准确率,通过更积极地进行数据增强、深化网络、增加层数和添加正则化,可能会获得更高的准确性。
最后,我们通过创建了一个 Python 脚本,该脚本可以 (1) 检测视频流中的人脸,以及 (2) 应用我们预先训练的 CNN 来实时识别主要的面部表情。此外,我们还包括每种情绪的概率分布,使我们能够更轻松地解释我们网络的结果。
另外,作为人类,我们总是某种混合的感情。 因此,在尝试标记给定人的面部表情时,检查 EmotionVGGNet 返回的概率分布非常重要。