分享

HuggingFace Accelerate解决分布式训练

 swordinhand 2023-09-22

由于项目代码比较复杂且可读性差…,尝试使用Hugging Face的Accelerate实现多卡的分布式训练。

1/ 为什么使用HuggingFace Accelerate

Accelerate主要解决的问题是分布式训练(distributed training),在项目的开始阶段,可能要在单个GPU上跑起来,但是为了加速训练,考虑多卡训练。当然,如果想要debug代码,推荐在CPU上运行调试,因为会产生更meaningful的错误

使用Accelerate的优势:

  • 可以适配CPU/GPU/TPU,也就是说,使用Accelerate后,只需要改动config(见后面安装和配置)就能实现同一套代码在不同的配置下训练
  • 非常方便的实现分布式评估(distributed evaluation)
  • 实现mixed precision和gradient accumulation更简单
  • 增强分布式系统中的日志记录和跟踪
  • 保存分布式系统的训练状态更为简单
  • 完全分片并行数据训练
  • 集成DeepSpeed
  • 整合了各种实验tracker,比如wandb、tensorboard等
  • CLI命令启动训练代码
  • 方便在Jupyter Notebook启动分布式训练

2/ 安装和配置

首先安装Accelerate ,通过pip或者conda

pip install accelerate

或者

conda install -c conda-forge accelerate

在要训练的机器上配置训练信息,输入

accelerate config

根据提示,完成配置。其他配置方法,比如直接写yaml文件等,参考官方教程

查看配置信息:

accelerate env

3/ 使用Accelerate

https:///docs/accelerate/basic_tutorials/migration

3.1/ 基本的pytorch训练过程

device = "cuda"
model.to(device)

for batch in training_dataloader:
    optimizer.zero_grad()
    inputs, targets = batch
    inputs = inputs.to(device)
    targets = targets.to(device)
    outputs = model(inputs)
    loss = loss_function(outputs, targets)
    loss.backward()
    optimizer.step()
    scheduler.step()

如何添加Accelerate到代码中呢?

3.2/ 添加Accelerate

from accelerate import Accelerator

accelerator = Accelerator() # 首先创建实例

# 训练相关的传入prepare()
model, optimizer, training_dataloader, scheduler = accelerator.prepare(
    model, optimizer, training_dataloader, scheduler
)

# device = "cuda"
# model.to(device)

for batch in training_dataloader:
    optimizer.zero_grad()
    inputs, targets = batch
    # inputs = inputs.to(device)
    # targets = targets.to(device)
    outputs = model(inputs)
    loss = loss_function(outputs, targets)
    # loss.backward()
    accelerator.backward(loss)
    
    optimizer.step()
    scheduler.step()

这样就修改完了,还是挺简单的。

:如果需要到device,此时device不再是cuda,而是

# device = 'cuda'
device = accelerator.device

4/ 启动训练

https:///docs/accelerate/v0.17.1/en/basic_tutorials/launch

首先,将上面的代码重写到一个函数中,并将其作为脚本进行调用,如:

  from accelerate import Accelerator
  
+ def main():
      accelerator = Accelerator()

      model, optimizer, training_dataloader, scheduler = accelerator.prepare(
          model, optimizer, training_dataloader, scheduler
      )

      for batch in training_dataloader:
          optimizer.zero_grad()
          inputs, targets = batch
          outputs = model(inputs)
          loss = loss_function(outputs, targets)
          accelerator.backward(loss)
          optimizer.step()
          scheduler.step()

+ if __name__ == "__main__":
+     main()

4.1/ 环境配置

前面已经配置过了,这步可以省略,但是如果想要换一个训练配置,比如2个卡换到3个卡,就需要重新配置一下

accelerate config

4.2/ 启动

accelerate launch {script_name.py} {--arg1} {--arg2} ...

这里只是用了最简单的命令,如果使用自己定义的配置文件启动等一些复杂的命令,参考官方教程

5/ 配合wandb记录实验

https:///docs/accelerate/main/en/usage_guides/tracking
https://docs./guides/integrations/accelerate

看了半天HuggingFace教程没看明白怎么添加其他wandb run的参数(我还是太菜了!),最后在wandb的教程中找到了… 传入init_kwargs参数

示例:

from accelerate import Accelerator

# Tell the Accelerator object to log with wandb
accelerator = Accelerator(log_with="wandb")

# Initialise your wandb run, passing wandb parameters and any config information
accelerator.init_trackers(
    project_name="my_project", 
    config={"dropout": 0.1, "learning_rate": 1e-2}
    init_kwargs={"wandb": {"entity": "my-wandb-team"}}
    )

...

# Log to wandb by calling `accelerator.log`, `step` is optional
accelerator.log({"train_loss": 1.12, "valid_loss": 0.8}, step=global_step)


# Make sure that the wandb tracker finishes correctly
accelerator.end_training()

6/ 全部代码

最后,完整的代码如下:

from accelerate import Accelerator

def main():
	accelerator = Accelerator(log_with="wandb") # 首先创建实例
	
	accelerator.init_trackers(
	    project_name="my_project", 
	    config={"dropout": 0.1, "learning_rate": 1e-2}
	    init_kwargs={"wandb": {"entity": "my-wandb-team"}}
	    )
	
	# 训练相关的传入prepare()
	model, optimizer, training_dataloader, scheduler = accelerator.prepare(
	    model, optimizer, training_dataloader, scheduler
	)
	
	# device = "cuda"
	# model.to(device)
	
	step = 0
	for batch in training_dataloader:
	    optimizer.zero_grad()
	    inputs, targets = batch
	    # inputs = inputs.to(device)
	    # targets = targets.to(device)
	    outputs = model(inputs)
	    loss = loss_function(outputs, targets)
	    
	    accelerator.log({"train_loss": loss}, step=step)
	    
	    # loss.backward()
	    accelerator.backward(loss)
	    
	    optimizer.step()
	    scheduler.step()
		
		step += 1
	    
if __name__ == "__main__":
     main()

参考

https:///docs/accelerate/v0.17.1/en/index
https://docs./guides/integrations/accelerate
Hugging Face Accelerate Super Charged With Weights & Biases

    本站是提供个人知识管理的网络存储空间,所有内容均由用户发布,不代表本站观点。请注意甄别内容中的联系方式、诱导购买等信息,谨防诈骗。如发现有害或侵权内容,请点击一键举报。
    转藏 分享 献花(0

    0条评论

    发表

    请遵守用户 评论公约

    类似文章 更多