分享

深度学习的 weight decay

 脑系科数据科学 2021-04-21

What is weight decay?权值衰减

Weight decay is a regularization technique by adding a small penalty, usually the L2 norm of the weights (all the weights of the model), to the loss function.

loss = loss + weight decay parameter * L2 norm of the weights

Some people prefer to only apply weight decay to the weights and not the bias. PyTorch applies weight decay to both weights and bias.

Why do we use weight decay?

  • To prevent overfitting.

  • To keep the weights small and avoid exploding gradient. Because the L2 norm of the weights are added to the loss, each iteration of your network will try to optimize/minimize the model weights in addition to the loss. This will help keep the weights as small as possible, preventing the weights to grow out of control, and thus avoid exploding gradient.

How do we use weight decay?

To use weight decay, we can simply define the weight decay parameter in the torch.optim.SGD optimizer or the torch.optim.Adam optimizer. Here we use 1e-4 as a default for weight_decay.

optimizer = torch.optim.SGD(model.parameters(), lr=1e-3, weight_decay=1e-4)optimizer = torch.optim.Adam(model.parameters(), lr=1e-3, weight_decay=1e-4)

    本站是提供个人知识管理的网络存储空间,所有内容均由用户发布,不代表本站观点。请注意甄别内容中的联系方式、诱导购买等信息,谨防诈骗。如发现有害或侵权内容,请点击一键举报。
    转藏 分享 献花(0

    0条评论

    发表

    请遵守用户 评论公约

    类似文章 更多