在Numpy中的图像增强基础讲解。

taotao_2016 2020-08-25

展开全文

本文将讨论使用NumPy库实现图像增强技术，以扩展数据集。我将讨论数据增强的目的、优缺点，并展示numpy中数据增强的基本示例。

为什么需要数据扩充？

我们首先想到的是扩充数据集的大小。但重要的是要理解我们正在解决的问题，这个问题很有名，也是ML中最流行的问题之一——过度拟合。如果我们没有几个值得构建分类器/回归器的数据示例，那么即使在构建模型之前，我们也会被过度拟合。众所周知，扩充数据集的大小可以减少过度拟合，这就是为什么我们从来没有满足于从外部给我们的数据量。现在，扩充我们的数据集意味着收集更多的样本，但几乎总是这样做会导致我们自己无法承担的开销：要么是钱的问题，要么是太耗时。因此，我们想要一种快速、廉价地扩充数据的方法——数据扩充。

想象一下，你必须建立一个模型，从图像数据中对一些稀有物种进行分类，而你并不满足于你能获得的照片数量。只需水平翻转图像的操作就可以两次扩充数据集的大小。如果你想分类的对象或多或少是对称的，再次垂直翻转数据，你会得到比最初多4倍的图像。考虑一下额外的转换，您可以将数据集的大小扩充10倍、100倍、1000倍。

什么时候可以尝试应用数据扩充？

我们有几个案例值得尝试数据扩充

我们必须构建预测器，但数据量很少。扩充我们的数据集大小可以帮助我们的模型从数据中学习模式以更好地预测，尽管得到的人工数据集将是单调的。
我们正在使用的模型不需要标签，因此我们唯一感兴趣的是 - 尽可能地将现有的样本作为最相似的样本。一个例子可以是Generative Adversarial Network系列，我们可以尝试扩充我们的数据集来稳定一些进行训练过程。
我们预测的数据具有高度不平衡的性质。如果数据集的标签分布高度不平衡，即正样本的数量比负样本少十倍，则该模型肯定会偏向于将大多数看不见的样本预测为负样本。这种情况最著名的方法称为SMOTE算法。
减少对抗性攻击问题。DNN模型的鲁棒性最近受到了对抗性攻击的挑战，在这种攻击中，输入样本上的小扰动可能导致分类错误。数据处理可以是快速和廉价的方法来减少问题。
增加我们模型的正规化方面。我们可以尝试合并数据集中的几个样本，并故意将其错误标记，以使我们的模型不太容易过度拟合。

什么时候不应该使用数据扩充？

我们的样本很少。数据扩充可以帮助提高预测器的稳健性，但在数据量非常小的情况下，我建议考虑不同的方式。
训练过程在时间和计算方面过于昂贵，而我们已经有足够的数据。说实话，这种情况非常罕见，因为即使非常深的模型也能提供数百万个样本的最佳性能。但在某些应用中，我们已经拥有大量原始数据，我们希望我们的样本尽可能具有代表性。在这种情况下，数据扩充没有任何意义。相反，建议在我们的数据上构建“prototype”样本并在其上训练模型。

注意

数据扩充不应该被视为过度拟合的灵丹妙药。当然，将数据集的大小扩充一千倍的能力非常诱人。此外，如果对我们正在训练的相同的增强数据执行验证，即使在交叉验证设置中，度量也会不断上升。您应该始终记住，设计任何ML系统的最终目标都是擅长生产。考虑到这一点，过度使用数据增强可能会使优化环境与生产环境分离。如果您的数据集非常小，您应该首先考虑要使用的模型类型。将经典的重神经网络应用于少数样本是没有意义的，因为它的优化器不是为低数据设置而设计的。在这种情况下，我建议使用简单的模型（基于最近邻的模型）或专门用于解决低数据问题的现代神经网络方法（few-shot学习和meta学习模型）。

Numpy中的数据扩充

在下面的部分中，我将仅使用常规工具进行图像增强的简单用例。我们需要的三件事是NumPy库，用于图像可视化的matplotlib + seaborn和用于图像旋转的scipy（可以选择Pillow库作为替代）

作为样本图像，我将使用著名的Lenna图片 - 70年代图像处理论文中的经典图像。图片为RGB，在以下示例中大小为100x100。

在Numpy中的图像增强基础讲解。

Lenna原图

首先，让我们导入包并查看Lenna的颜色分布。

import numpy as npimport scipyimport matplotlib.pyplot as plt%matplotlib inlineimport seaborn as snsfrom scipy.ndimage import rotatesns.set(color_codes=True)img = np.array(plt.imread('lenna.png'))plt.figure(figsize=(5, 5), dpi=100)sns.distplot(img[:, :, 0].flatten(), color='maroon')sns.distplot(img[:, :, 1].flatten(), color='green')sns.distplot(img[:, :, 2].flatten(), color='blue').set_title('RGB Distribution')

在Numpy中的图像增强基础讲解。

注意，加载的图像有4个通道（3个用于颜色，1个用于透明度），每个通道中的值位于[0,1]范围内。接下来的几个函数是简单的图像显示在jupyter notebook中和显示图片在网格。

def show_img(img, ax): ax.grid(False) ax.set_xticks([]) ax.set_yticks([]) ax.imshow(img) def plot_grid(imgs, nrows, ncols, figsize=(10, 10)): assert len(imgs) == nrows*ncols, f'Number of images should be {nrows}x{ncols}' _, axs = plt.subplots(nrows, ncols, figsize=figsize) axs = axs.flatten() for img, ax in zip(imgs, axs):show_img(img, ax)

在Numpy中的图像增强基础讲解。

Translations（平移）

平移就是把一幅画简单地向某个方向移动。让我们创建一个函数，该函数接受所需的移动方向、用于移动的像素量以及当图像移动时留下空白的补丁的行为。

def translate(img, shift=10, direction='right', roll=True): assert direction in ['right', 'left', 'down', 'up'], 'Directions should be top|up|left|right' img = img.copy() if direction == 'right': right_slice = img[:, -shift:].copy() img[:, shift:] = img[:, :-shift] if roll: img[:,:shift] = np.fliplr(right_slice) if direction == 'left': left_slice = img[:, :shift].copy() img[:, :-shift] = img[:, shift:] if roll: img[:, -shift:] = left_slice if direction == 'down': down_slice = img[-shift:, :].copy() img[shift:, :] = img[:-shift,:] if roll: img[:shift, :] = down_slice if direction == 'up': upper_slice = img[:shift, :].copy() img[:-shift, :] = img[shift:, :] if roll: img[-shift:,:] = upper_slice return imgplot_grid([translate(img, direction='up', shift=20), translate(img, direction='down', shift=20), translate(img, direction='left', shift=20), translate(img, direction='right', shift=20)],1, 4, figsize=(10, 5))

在Numpy中的图像增强基础讲解。

平移图像的类似技术是从图像中裁剪随机补丁，然后将其调整为所需格式。因此，您可以为同一图像获得几个略有不同的随机补丁。

def random_crop(img, crop_size=(10, 10)): assert crop_size[0] <= img.shape[0] and crop_size[1] <= img.shape[1], 'Crop size should be less than image size' img = img.copy() w, h = img.shape[:2] x, y = np.random.randint(h-crop_size[0]), np.random.randint(w-crop_size[1]) img = img[y:y+crop_size[0], x:x+crop_size[1]]return img

在Numpy中的图像增强基础讲解。

Rotations（旋转）

增强图像的一种非常有效的方法是随机旋转它。要记住的小细节是我们必须通过一些内容替换角落中的“empty”空间，以使图像更自然。

def rotate_img(img, angle, bg_patch=(5,5)): assert len(img.shape) <= 3, 'Incorrect image shape' rgb = len(img.shape) == 3 if rgb: bg_color = np.mean(img[:bg_patch[0], :bg_patch[1], :], axis=(0,1)) else: bg_color = np.mean(img[:bg_patch[0], :bg_patch[1]]) img = rotate(img, angle, reshape=False) mask = [img <= 0, np.any(img <= 0, axis=-1)][rgb] img[mask] = bg_colorreturn img

在Numpy中的图像增强基础讲解。

Random Noise（随机噪音）

在应用平移和旋转之后，通过应用高斯噪声对增强图像中添加附加的随机性是有帮助的。我们可以使用np.random.normal作为更改样本的简单方法

def gaussian_noise(img, mean=0, sigma=0.03): img = img.copy() noise = np.random.normal(mean, sigma, img.shape) mask_overflow_upper = img+noise >= 1.0 mask_overflow_lower = img+noise < 0 noise[mask_overflow_upper] = 1.0 noise[mask_overflow_lower] = 0 img += noisereturn img

在Numpy中的图像增强基础讲解。

Distortions（扭曲）

另一种改变原始样本的有趣方法是以某种方式扭曲它。作为一个简单的例子，我们可以应用由三角函数（余弦或正弦）引导的图像行或列的连续移位。得到的图像在水平或垂直方向上将是“波浪形的”。通过调整函数参数，我们可以实现所需的变形能力，从而产生具有相同内容的不同图像

def distort(img, orientation='horizontal', func=np.sin, x_scale=0.05, y_scale=5): assert orientation[:3] in ['hor', 'ver'], 'dist_orient should be 'horizontal'|'vertical'' assert func in [np.sin, np.cos], 'supported functions are np.sin and np.cos' assert 0.00 <= x_scale <= 0.1, 'x_scale should be in [0.0, 0.1]' assert 0 <= y_scale <= min(img.shape[0], img.shape[1]), 'y_scale should be less then image size' img_dist = img.copy() def shift(x): return int(y_scale * func(np.pi * x * x_scale)) for c in range(3): for i in range(img.shape[orientation.startswith('ver')]): if orientation.startswith('ver'): img_dist[:, i, c] = np.roll(img[:, i, c], shift(i)) else: img_dist[i, :, c] = np.roll(img[i, :, c], shift(i)) return img_distimgs_distorted = []for ori in ['ver', 'hor']: for x_param in [0.01, 0.02, 0.03, 0.04]: for y_param in [2, 4, 6, 8, 10]: imgs_distorted.append(distort(img, orientation=ori, x_scale=x_param, y_scale=y_param))plot_grid(imgs_distorted, 4, 10, figsize=(20, 8))

在Numpy中的图像增强基础讲解。

颜色通道

最后一段涉及更改单个图像通道以生成与原始图像通道略有不同的颜色主题。最简单的方法就是将某个信道乘以给定的比值

def change_channel_ratio(img, channel='r', ratio=0.5): assert channel in 'rgb', 'Value for channel: r|g|b' img = img.copy() ci = 'rgb'.index(channel) img[:, :, ci] *= ratio return imgplot_grid([change_channel_ratio(img, ratio=0.3), change_channel_ratio(img, ratio=0.6),  change_channel_ratio(img, ratio=0.9)],1, 3, figsize=(10, 5))

在Numpy中的图像增强基础讲解。

更复杂的方法是使用一些随机过程来改变通道值

def change_channel_ratio_gauss(img, channel='r', mean=0, sigma=0.03): assert channel in 'rgb', 'cahenel must be r|g|b' img = img.copy() ci = 'rgb'.index(channel) img[:, :, ci] = gaussian_noise(img[:, :, ci], mean=mean, sigma=sigma) return img plot_grid([change_channel_ratio_gauss(img, mean=-0.01, sigma=0.1), change_channel_ratio_gauss(img, mean=-0.05, sigma=0.1), change_channel_ratio_gauss(img, mean=-0.1, sigma=0.1)],1, 3, figsize=(10, 5))

在Numpy中的图像增强基础讲解。

上述基本修改可以在训练机器学习模型的过程中派上用场。以下是 jupyter notebook内容。

import numpy as npimport scipyimport matplotlib.pyplot as plt%matplotlib inlineimport seaborn as snsfrom scipy.ndimage import rotatesns.set(color_codes=True)img = np.array(plt.imread('/home/igor/lenna.png'))plt.figure(figsize=(5, 5), dpi=100)sns.distplot(img[:, :, 0].flatten(), color='maroon')sns.distplot(img[:, :, 1].flatten(), color='green')sns.distplot(img[:, :, 2].flatten(), color='blue').set_title('RGB Distribution')def show_img(img, ax): ax.grid(False) ax.set_xticks([]) ax.set_yticks([]) ax.imshow(img) def plot_grid(imgs, nrows, ncols, figsize=(10, 10)): assert len(imgs) == nrows*ncols, f'Number of images should be {nrows}x{ncols}' _, axs = plt.subplots(nrows, ncols, figsize=figsize) axs = axs.flatten() for img, ax in zip(imgs, axs): show_img(img, ax)#Translationsdef translate(img, shift=10, direction='right', roll=True): assert direction in ['right', 'left', 'down', 'up'], 'Directions should be top|up|left|right' img = img.copy() if direction == 'right': right_slice = img[:, -shift:].copy() img[:, shift:] = img[:, :-shift] if roll: img[:,:shift] = np.fliplr(right_slice) if direction == 'left': left_slice = img[:, :shift].copy() img[:, :-shift] = img[:, shift:] if roll: img[:, -shift:] = left_slice if direction == 'down': down_slice = img[-shift:, :].copy() img[shift:, :] = img[:-shift,:] if roll: img[:shift, :] = down_slice if direction == 'up': upper_slice = img[:shift, :].copy() img[:-shift, :] = img[shift:, :] if roll: img[-shift:,:] = upper_slice return imgplot_grid([translate(img, direction='up', shift=20), translate(img, direction='down', shift=20),  translate(img, direction='left', shift=20), translate(img, direction='right', shift=20)], 1, 4, figsize=(10, 5))def random_crop(img, crop_size=(10, 10)): assert crop_size[0] <= img.shape[0] and crop_size[1] <= img.shape[1], 'Crop size should be less than image size' img = img.copy() w, h = img.shape[:2] x, y = np.random.randint(h-crop_size[0]), np.random.randint(w-crop_size[1]) img = img[y:y+crop_size[0], x:x+crop_size[1]] return imgplot_grid([random_crop(img, crop_size=(70, 70)), random_crop(img, crop_size=(70, 70)), random_crop(img, crop_size=(70, 70)), random_crop(img, crop_size=(70, 70))], 1, 4, figsize=(10, 5))#Rotationsdef rotate_img(img, angle, bg_patch=(5,5)): assert len(img.shape) <= 3, 'Incorrect image shape' rgb = len(img.shape) == 3 if rgb: bg_color = np.mean(img[:bg_patch[0], :bg_patch[1], :], axis=(0,1)) else: bg_color = np.mean(img[:bg_patch[0], :bg_patch[1]]) img = rotate(img, angle, reshape=False) mask = [img <= 0, np.any(img <= 0, axis=-1)][rgb] img[mask] = bg_color return imgplot_grid([rotate_img(img, angle=-15), rotate_img(img, angle=-30),  rotate_img(img, angle=15), rotate_img(img, angle=30),], 1, 4, figsize=(10, 5))#Random Noisedef gaussian_noise(img, mean=0, sigma=0.03): img = img.copy() noise = np.random.normal(mean, sigma, img.shape) mask_overflow_upper = img+noise >= 1.0 mask_overflow_lower = img+noise < 0 noise[mask_overflow_upper] = 1.0 noise[mask_overflow_lower] = 0 img += noise return imgplot_grid([gaussian_noise(img, sigma=0.03), gaussian_noise(img, sigma=0.1),  gaussian_noise(img, sigma=0.3), gaussian_noise(img, sigma=0.5)], 1, 4, figsize=(10, 5))#Distortionsdef distort(img, orientation='horizontal', func=np.sin, x_scale=0.05, y_scale=5): assert orientation[:3] in ['hor', 'ver'], 'dist_orient should be 'horizontal'|'vertical'' assert func in [np.sin, np.cos], 'supported functions are np.sin and np.cos' assert 0.00 <= x_scale <= 0.1, 'x_scale should be in [0.0, 0.1]' assert 0 <= y_scale <= min(img.shape[0], img.shape[1]), 'y_scale should be less then image size' img_dist = img.copy()  def shift(x): return int(y_scale * func(np.pi * x * x_scale))  for c in range(3): for i in range(img.shape[orientation.startswith('ver')]): if orientation.startswith('ver'): img_dist[:, i, c] = np.roll(img[:, i, c], shift(i)) else: img_dist[i, :, c] = np.roll(img[i, :, c], shift(i))  return img_distimgs_distorted = []for ori in ['ver', 'hor']: for x_param in [0.01, 0.02, 0.03, 0.04]: for y_param in [2, 4, 6, 8, 10]: imgs_distorted.append(distort(img, orientation=ori, x_scale=x_param, y_scale=y_param))plot_grid(imgs_distorted, 4, 10, figsize=(20, 8))#Color channels changedef change_channel_ratio(img, channel='r', ratio=0.5): assert channel in 'rgb', 'Value for channel: r|g|b' img = img.copy() ci = 'rgb'.index(channel) img[:, :, ci] *= ratio return imgplot_grid([change_channel_ratio(img, ratio=0.3), change_channel_ratio(img, ratio=0.6),  change_channel_ratio(img, ratio=0.9)], 1, 3, figsize=(10, 5))def change_channel_ratio_gauss(img, channel='r', mean=0, sigma=0.03): assert channel in 'rgb', 'cahenel must be r|g|b' img = img.copy() ci = 'rgb'.index(channel) img[:, :, ci] = gaussian_noise(img[:, :, ci], mean=mean, sigma=sigma) return imgplot_grid([change_channel_ratio_gauss(img, mean=-0.01, sigma=0.1), change_channel_ratio_gauss(img, mean=-0.05, sigma=0.1),  change_channel_ratio_gauss(img, mean=-0.1, sigma=0.1)], 1, 3, figsize=(10, 5))

在Numpy中的图像增强基础讲解。

感谢您阅读，希望您可以在自己的案例中应用上述转换！