5步搭建wsl2+cuda+docker解决windows深度学习开发问题

黄爸爸好 2022-12-02 发布于上海

展开全文

由于现在网上的各个wsl2 cuda的中文帖子都已经过时了，操作步骤也非常复杂，

这里折腾了两个晚上，重新走通了一条更简单的路径，分享给大家。

注册快速通道：Windows Insider Program

根据其中指引进行操作

重启之后：设置--更新和安全--windows预览体验计划--修改至beta或dev通道--更新至win11

注意1：现在beta和dev通道要升级到windows11，你的配置需要符合win11的硬件要求。TPM的开启可以参考如何看待微软 Windows 11硬件要求 TPM 2.0，不符合将无法升级？ - 知乎 (zhihu.com)

注意2：如果不符合就只能看到Release Preview通道，不用尝试了，我已经尝试过了，Release Preview通道更新到的版本过低，在第5步会报错。不符合硬件要求的也可以强行升级，可以参考：如何绕过微软限制、强行升级到Win11？看完这篇文章你就明白了！ - 知乎 (zhihu.com)强行升级可能在后续更新存在风险，需谨慎评估。

2021.10.10更：现在win11已经正式发布了，感兴趣的同学可以试下不开快速通道能不能跑通下面的步骤，欢迎反馈呀~ 2021.10.25更评论区已有靓仔反馈不开快速通道也能用了

2. windows安装驱动：GPU in Windows Subsystem for Linux (WSL)

注意，它会自动安装CUDA, DirectML，DirectX，不要再在wsl中安装任何linux显示驱动。

3. 安装WSL2：Install WSL on Windows 10

分发版建议ubuntu

4. 安装cuda

wsl2中执行CUDA Toolkit 11.4 Update 2 Downloads | NVIDIA Developer

这个链接下2021.9.11的执行脚本是：

wget https://developer.download./compute/cuda/repos/wsl-ubuntu/x86_64/cuda-wsl-ubuntu.pin
sudo mv cuda-wsl-ubuntu.pin /etc/apt/preferences.d/cuda-repository-pin-600
wget https://developer.download./compute/cuda/11.4.2/local_installers/cuda-repo-wsl-ubuntu-11-4-local_11.4.2-1_amd64.deb
sudo dpkg -i cuda-repo-wsl-ubuntu-11-4-local_11.4.2-1_amd64.deb
sudo apt-key add /var/cuda-repo-wsl-ubuntu-11-4-local/7fa2af80.pub
sudo apt-get update
sudo apt-get -y install cuda

5. 安装docker

别试Docker for Windows了，试过了，不能用，cuda-sample:nbody能跑，但其它例如cuda、torch、tf之类的镜像都检测不到gpu，有问题。

直接在WSL2内安装nvidia-docker

export PATH=$PATH:/usr/lib/wsl/lib
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia./nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L https://nvidia./nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
curl -s -L https://nvidia./libnvidia-container/experimental/$distribution/libnvidia-container-experimental.list | sudo tee /etc/apt/sources.list.d/libnvidia-container-experimental.list
sudo apt-get update
sudo apt-get install -y nvidia-docker2

OK，可以跑深度学习了：

sudo service docker start
docker run --name mytorch --gpus all -it pytorch/pytorch:1.9.0-cuda10.2-cud
nn7-runtime bash

进去之后如果显示

The NVIDIA Driver was not detected.  GPU functionality will not be available.

不要管它，这是一个错误的错误提示（= =这个错误的错误提示浪费了我半管血...），才找到文档里面的说明

Note that this message is an incorrect warning for WSL 2 and will be fixed in future releases of the DL Framework containers to correctly detect the NVIDIA GPUs. The DL Framework containers will still continue to be accelerated using CUDA on WSL 2

进入python交互模式：

Python 3.7.10 (default, Feb 26 2021, 18:47:35)
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.zeros([1]).cuda()
tensor([0.], device='cuda:0')
>>> torch.version.cuda
'10.2'
>>> torch.backends.cudnn.version()
7605

说明一切顺利，enjoy cuda wsl2 docker！