神经网络与传统统计方法的简单对比

网摘文苑 2022-11-14

展开全文

传统的统计方法如OLS假设变量之间符合简单的线性关系或者高阶线性关系进行拟合(或函数逼近)，然而，并不是所有关系都是简单的线性关系或者高阶线性关系，这时就需要借助神经网络 (neural network，NN)等方法来进行建模。神经网络可以在不需要知道函数关系具体形式的条件下近似各种函数关系。

预测模型

1. scikit-learn

下例使用scikit-learn 库中的 MLPRegressor 类，该类可用 DNN 进行回归估计。DNN 有时也被称为多层感知器(multi-layer perceptron，MLP)。从最终的MSE来看，结果并不完美，但是对一个配置简单的模型来说，效果已经非常不错了。

from sklearn.neural_network import MLPRegressor# 生成样本数据def f(x):    return 2 * x ** 2 - x ** 3 / 3x = np.linspace(-2, 4, 25)y = f(x)# 实例化 MLPRegressor 对象model = MLPRegressor(hidden_layer_sizes=3 * [256], learning_rate_init=0.03, max_iter=5000)# 拟合或学习步骤。model.fit(x.reshape(-1, 1), y)# 预测步骤y_ = model.predict(x.reshape(-1, 1))MSE = ((y - y_) ** 2).mean()MSE# Out:# 0.003216321978018745

样本和预测结果图

plt.figure(figsize=(10, 6))plt.plot(x, y, 'ro', label='sample data')plt.plot(x, y_, lw=3.0, label='dnn estimation')plt.legend();

样本数据和基于神经网络的预测

2. Keras

下一个示例使用了 Keras 深度学习软件包中的序列模型 Sequential，对该模型每轮进行100次迭代训练，重复5轮。每轮训练之后，我们将更新并绘制由神经网络预测的近似值。如图显示，随着每一轮训练的近似值的准确率逐渐提高，MSE值逐渐降低。与之前的模型相似，最终结果并不完美，但是鉴于模型的简单性，它还是不错的。

import tensorflow as tftf.random.set_seed(100)from keras.layers import Densefrom keras.models import Sequential# 实例化 Sequential 模型对象model = Sequential()# 添加采用整流线性单元(ReLU)激活函数的全连接层作为隐藏层model.add(Dense(256, activation='relu', input_dim=1))# 添加线性激活的输出层model.add(Dense(1, activation='linear'))# 编译模型对象model.compile(loss='mse', optimizer='rmsprop')# 原始样本数据图plt.figure(figsize=(10, 6))plt.plot(x, y, 'ro', label='sample data')# 迭代训练指定次数for _ in range(1, 6):    # 训练神经网络    model.fit(x, y, epochs=100, verbose=False)    # 预测近似值    y_ = model.predict(x)    # 计算当前的 MSE    MSE = ((y - y_.flatten()) ** 2).mean()    print(f'round={_} | MSE={MSE:.5f}')    # 绘制当前的近似结果    plt.plot(x, y_, '--', label=f'round={_}')plt.legend();# Out:# round=1 | MSE=3.87256# round=2 | MSE=0.92527# round=3 | MSE=0.28527# round=4 | MSE=0.13191# round=5 | MSE=0.09568

样本数据和多轮训练后得到的预测值

从以上两个示例来看，相比OLS回归完美的复刻原有方程的系数，神经网络只能提供一个近似的预测，那么为什么还要使用神经网络呢？假设我们的数据不是通过预定义好的数学函数生成的，而是随机产生的特征和标签呢？下面我们再看一个例子，当然该示例仅用于说明，不具有实际意义。

# 随机生成测试数据np.random.seed(0)x = np.linspace(-1, 1)y = np.random.random(len(x)) * 2 - 1# 用不同的多次项OLS回归进行拟合plt.figure(figsize=(10, 6))plt.plot(x, y, 'ro', label='sample data')for deg in [1, 5, 9, 11, 13, 15]:    reg = np.polyfit(x, y, deg=deg)    y_ = np.polyval(reg, x)    MSE = ((y - y_) ** 2).mean()    print(f'deg={deg:2d} | MSE={MSE:.5f}')    plt.plot(x, np.polyval(reg, x), label=f'deg={deg}')plt.legend();# Out:# deg= 1 | MSE=0.28153# deg= 5 | MSE=0.27331# deg= 9 | MSE=0.25442# deg=11 | MSE=0.23458# deg=13 | MSE=0.22989# deg=15 | MSE=0.21672

随机样本数据和 OLS 回归线

明显可见，OLS 回归的效果并不理想。OLS回归假设我们可以通过有限个(基于多项式的)基函数的组合来逼近目标函数，由于样本数据集是随机生成的，因此在这种情况下，OLS 回归效果不佳。下面我们用神经网络来试下。

model = Sequential()model.add(Dense(256, activation='relu', input_dim=1))# 此处添加3个隐藏层for _ in range(3):    model.add(Dense(256, activation='relu'))model.add(Dense(1, activation='linear'))model.compile(loss='mse', optimizer='rmsprop')# 显示神经网络架构以及可训练参数的数量model.summary()# Out:# Model: 'sequential_1'# _________________________________________________________________#  Layer (type)                Output Shape              Param #   # =================================================================#  dense_2 (Dense)             (None, 256)               512       #                                                                  #  dense_3 (Dense)             (None, 256)               65792     #                                                                  #  dense_4 (Dense)             (None, 256)               65792     #                                                                  #  dense_5 (Dense)             (None, 256)               65792     #                                                                  #  dense_6 (Dense)             (None, 1)                 257       #                                                                  # =================================================================# Total params: 198,145# Trainable params: 198,145# Non-trainable params: 0# _________________________________________________________________%%timeplt.figure(figsize=(10, 6))plt.plot(x, y, 'ro', label='sample data')for _ in range(1, 8):    model.fit(x, y, epochs=500, verbose=False)    y_ =  model.predict(x)    MSE = ((y - y_.flatten()) ** 2).mean()    print(f'round={_} | MSE={MSE:.5f}')    plt.plot(x, y_, '--', label=f'round={_}')plt.legend();# Out:# round=1 | MSE=0.13428# round=2 | MSE=0.08515# round=3 | MSE=0.05811# round=4 | MSE=0.04389# round=5 | MSE=0.03376# round=6 | MSE=0.00722# round=7 | MSE=0.00644# CPU times: user 22.8 s, sys: 3.97 s, total: 26.8 s# Wall time: 12.1 s

随机样本数据和神经网络预测

尽管预测结果并不完美，但预测结果明显好于OLS。神经网络架构有近200000个可训练的参数(权重)，与OLS 回归(最多使用15+1个参数)相比，这提供了相对较高的灵活性。

分类任务

神经网络也可以很容易地用于分类任务。考虑以下基于 Keras 实现神经网络分类，二元特征数据和二元标签数据是随机生成的。建模方面的主要调整是将输出层的激活函数从linear更改为sigmoid。虽然分类效果并不完美，但是也达到了很高的准确率。

# 创建随机特征数据和标签数据f = 5n = 10np.random.seed(124812)x = np.random.randint(0, 2, (n, f))y = np.random.randint(0, 2, n)model = Sequential()model.add(Dense(256, activation='relu', input_dim=f))# 输出层的激活函数为 sigmoidmodel.add(Dense(1, activation='sigmoid'))# 损失函数为 binary_crossentropymodel.compile(loss='binary_crossentropy', optimizer='rmsprop', metrics=['acc'])model.fit(x, y, epochs=50, verbose=False)y_ = np.where(model.predict(x).flatten() > 0.5, 1, 0)# 预测值与标签数据的比较结果y == y_# Out:# array([ True,  True,  True,  True,  True,  True,  True, False,  True, True])# 绘制每轮训练的损失函数和准确率值res = pd.DataFrame(model.history.history)res.plot(figsize=(10, 6));