【原】LLMs之PEFT：《Parameter-Efficient Fine-Tuning for Large Models: A Comprehensive Survey》翻译与解读

处女座的程序猿 2024-12-10 发布于上海

展开全文

LLMs之PEFT：《Parameter-Efficient Fine-Tuning for Large Models: A Comprehensive Survey》翻译与解读

导读：这篇论文是一篇关于大型模型参数高效微调（PEFT）的综述文章。文章系统地总结了PEFT领域的最新进展，涵盖了算法设计、计算效率、应用场景和系统实现等多个方面。

>> 背景痛点：

● 大型模型的计算成本高昂：大型语言模型（LLM）等大型模型参数量巨大（数十亿甚至上千亿），直接微调到特定下游任务需要巨大的计算资源，这在计算能力受限的硬件平台上尤其困难。

● 全模型微调的局限性：对大型模型进行全模型微调计算成本高，且可能损害模型的泛化能力。

>> 具体的解决方案：PEFT 的核心思想是在尽可能少增加参数或计算资源的情况下，调整预训练大型模型的参数，使其适应特定任务或领域。文章将 PEFT 方法分为四大类：

● 增量式 PEFT：在模型架构中添加少量可训练模块（例如 Adapter、Soft Prompt 等），只更新这些新增模块的参数。各种 Adapter 的改进（串行、并行、多任务自适应等）和 Soft Prompt 的改进（Prefix-tuning、p-tuning 等）都属于此类。还包括一些其他增量方法，例如 (IA)³ 和 SSF，它们通过调整激活函数来实现参数高效微调。

● 选择性 PEFT：只选择模型中一部分参数进行微调，其余参数保持不变。这可以通过结构化或非结构化的掩码来实现。例如，BitFit 只微调偏置参数，而其他方法则根据参数重要性（例如 Fisher 信息）选择要微调的参数。

● 重参数化 PEFT：通过构建原始模型参数的低秩近似来进行训练，然后将训练好的参数转换回原始模型参数进行推理，保证推理速度。 LoRA 是最具代表性的方法，它通过引入两个低秩矩阵来更新权重矩阵。其他重参数化方法包括 DyLoRA、AdaLoRA、SoRA、Compacter 等，它们在秩选择、参数更新策略等方面进行了改进。

● 混合式 PEFT：结合多种 PEFT 方法的优点，例如 UniPELT 结合了 LoRA、Prefix-tuning 和 Adapter。一些研究还利用神经架构搜索 (NAS) 来寻找最佳的 PEFT 方法组合。

>> 核心思路步骤：PEFT 方法的核心思路步骤大致如下：

● 选择 PEFT 方法：根据具体任务和模型选择合适的 PEFT 方法。

● 添加或选择参数：根据所选方法，添加新的可训练参数或选择现有参数的子集。

● 微调模型：只更新所添加或选择的参数，保持其余参数不变。

● 评估性能：在目标任务上评估微调后的模型性能。

>> 优势：

● 显著降低计算成本：PEFT 方法显著降低了大型模型微调的计算成本和资源消耗。

● 提高训练效率：相比于全模型微调，PEFT 方法能够更快地收敛。

● 提升模型泛化能力：在某些情况下，PEFT 方法能够提高模型的泛化能力。

>> 论文结论和观点：

● PEFT 是一个高效适应大型模型到下游任务的有效方法。

● 文章对各种 PEFT 方法进行了全面的分类和总结。

● 文章探讨了 PEFT 在不同模型架构（LLM、ViT、VLA、Diffusion Models）和下游任务上的应用。

● 文章分析了 PEFT 系统设计的挑战，包括集中式 PEFT 查询服务、分布式 PEFT 训练和并行 PEFT 训练。

● 文章提出了未来研究方向，包括简化超参数调整、建立统一的基准、提高训练效率、探索缩放规律、服务更多模型和任务、增强数据隐私以及 PEFT 与模型压缩的结合。

总而言之，这篇论文对参数高效微调 (PEFT) 进行了全面的综述，总结了该领域的最新进展，并指出了未来研究方向。它为研究者理解和应用 PEFT 提供了宝贵的参考。

《Parameter-Efficient Fine-Tuning for Large Models: A Comprehensive Survey》翻译与解读

地址	论文地址：https:///abs/2403.14608
时间	2024年8月29日
作者	东北大学，加州大学河滨分校，亚利桑那州立大学，纽约大学

Abstract

Large models represent a groundbreaking advancement in multiple application fields, enabling remarkable achievements across various tasks. However, their unprecedented scale comes with significant computational costs. These models, often consisting of billions of parameters, require vast amounts of computational resources for execution. Especially, the expansive scale and computational demands pose considerable challenges when customizing them for particular downstream tasks, particularly over the hardware platforms constrained by computational capabilities.

Parameter Efficient Fine-Tuning (PEFT) provides a practical solution by efficiently adjusting the large models over the various downstream tasks. In particular, PEFT refers to the process of adjusting the parameters of a pre-trained large model to adapt it to a specific task or domain while minimizing the number of additional parameters introduced or computational resources required. This approach is particularly important when dealing with large-scale language models with high parameter counts, as fine-tuning these models from scratch can be computationally expensive and resource-intensive, posing considerable challenges in the supporting system platform design.

In this survey, we present comprehensive studies of various PEFT algorithms, examining their performance and computational overhead. Moreover, we provide an overview of applications developed using different PEFT algorithms and discuss common techniques employed to mitigate computation costs for PEFT. In addition to providing an extensive survey from an algorithmic standpoint, we also examine various real-world system designs to investigate the implementation costs associated with different PEFT approaches. This survey serves as a valuable resource for researchers aiming to understand both the PEFT algorithm and its system implementation, offering detailed insights into recent advancements and practical applications.

大型模型在多个应用领域取得了突破性进展，在各种任务中实现了显著的成就。然而，其前所未有的规模也带来了巨大的计算成本。这些模型通常包含数十亿个参数，执行它们需要大量的计算资源。特别是，庞大的规模和计算需求在为特定下游任务定制模型时带来了巨大的挑战，特别是在受计算能力限制的硬件平台上。

参数高效微调（PEFT）提供了一种实用的解决方案，通过高效地调整大型模型以适应各种下游任务。特别是，PEFT指的是在最小化引入的新参数数量和所需计算资源的情况下，调整预训练大型模型的参数以适应特定任务或领域。当处理具有大量参数的大型语言模型时，这种方法尤其重要，因为从头开始微调这些模型可能会非常耗时且资源密集，这给支持系统平台的设计带来了相当大的挑战。

在这项调查中，我们对各种PEFT算法进行了全面的研究，考察它们的性能和计算开销。此外，我们还提供了使用不同PEFT算法开发的应用程序的概述，并讨论了用于减轻PEFT计算成本的常见技术。除了从算法角度提供详尽的综述外，我们还研究了各种实际系统设计，以调查不同PEFT方法的实现成本。这项调查为旨在了解PEFT算法及其系统实现的研究人员提供了宝贵的资源，提供了对近期进展和实际应用的详细见解。

Index Terms: Large Language Model, Parameter-Efficient Fine-tuning, Computer System, Distributed System.

关键词：大型语言模型、参数高效微调、计算机系统、分布式系统。

1、Introduction

Large Models (LMs) have recently captured considerable public interest. Their ability to understand context and nuances enables them to proficiently handle diverse tasks across multiple domains, including natural language processing (NLP), computer vision (CV), etc. In the field of NLP, Large Language Models (LLMs) have achieved significant advancements across various tasks including text generation [1, 2], translation [3, 4], personalized chat-bots [5, 6, 7], and summarization [8], demonstrating remarkable proficiency.

Earlier studies [1] have suggested that LLMs exhibit high levels of generalization, enabling them to apply their acquired knowledge to new tasks not included in their original training. This capability is commonly known as zero-shot learning. Nevertheless, fine-tuning remains essential to further enhance LLMs for optimal performance on new user datasets and tasks.

大型模型（LMs）最近吸引了公众的广泛关注。它们能够理解语境和细微差别，因此能够在包括自然语言处理（NLP）、计算机视觉（CV）等多个领域中高效地处理各种任务。在NLP领域，大型语言模型（LLMs）在包括文本生成在内的各种任务上取得了显著进步[1, 2]，在翻译[3, 4]、个性化聊天机器人[5, 6, 7]和摘要[8]等方面也展现出了卓越的能力。

早期的研究[1]表明，LLMs具有很高的泛化能力，能够将它们所学的知识应用于未在原始训练中包含的新任务。这种能力通常被称为零样本学习。然而，为了在新的用户数据集和任务上实现最佳性能，对LLMs进行微调仍然是必不可少的。

Due to its scale, a widely adopted strategy for fine-tuning LLMs involves adjusting a limited number of LLM parameters while keeping the remainder unchanged. This technique, termed Parameter-Efficient-Fine-Tuning (PEFT), involves selectively adjusting a small proportion of their parameters while keeping the rest unaltered. Furthermore, the application of PEFT extends beyond the realm of NLP and quickly attracts interest in the CV community for handling fine-tuning vision models with large parameters, such as Vision Transformers (ViT) and diffusion models, as well as disciplinary models such as vision-language models.

In this survey, we systematically review and categorize recent advancements in PEFT algorithms as well as the system implementation costs associated with various PEFT algorithms across diverse scenarios. Figure 1 presents the overview content for this survey. In section II, we present some fundamental concepts for LLM and PEFT, including computational flow for LLM, basic knowledge of PEFT, commonly used datasets and tasks, and evaluation benchmarks.

由于其规模庞大，一种广泛采用的微调LLMs的方法是仅调整少量LLM参数，而其余部分保持不变。这种技术被称为参数高效微调(PEFT)，它涉及有选择地调整一小部分参数，而其余部分保持不变。此外，PEFT的应用范围不仅限于NLP领域，在CV社区中也迅速引起了人们对于使用大型参数的视觉模型（如ViT和扩散模型）以及跨学科模型（如视觉语言模型）进行微调的兴趣。

在这项调查中，我们系统地回顾和分类了PEFT算法在不同场景下的系统实现成本。图1展示了本调查的概览内容。在第II部分，我们介绍了一些LLM和PEFT的基本概念，包括LLM的计算流程、PEFT的基本知识、常用的数据集和任务以及评估基准。

We categorize all types of PEFT algorithms in Section III according to their computational flow. In Section III-A, we detail additive algorithms that either introduce new weight parameters or modify activations. Algorithms that only require fine-tuning of existing parameters are categorized as selective approaches, which are introduced in Section III-B. In Section III-C, we explore reparameterized PEFT, which constructs a (low- dimensional) reparameterization of original model parameters for training while transforming the weights back to maintain the inference speed. Additionally, there exist algorithms that combine the above techniques, and we have classified these as hybrid approaches, elaborating on them in Section III-D. We also investigate strategies for further reducing the computational complexity of different PEFT algorithms, including KV-cache management, pruning, quantization, and memory optimization, in Section IV.

In Section V, we expand the scope of this survey beyond the computational perspective to involve various potential application scenarios. Specifically, we explore innovations that applying PEFT techniques to different model architecture, including LLMs (Section V-A), Vision Transformer (Section V-B), Vision-Language alignment models (Section V-C), and Diffusion models (Section V-D), for varied downstream tasks, underscoring PEFT’s versatility and applicability in a range of scenarios. After that, in Section VI, we explore the system design challenge for PEFT methods. The discussion includes three advanced system solutions for practical PEFT deployment: PEFT query serving (Section VI-B), distributed tuning (Section VI-C), and concurrent PEFT tuning (Section VI-D). Finally, in Section VII, we summarize our survey and propose several potential future directions from both algorithmic and systemic perspectives, aiming to offer valuable insights for further research and development in the field.

我们根据计算流程将所有类型的PEFT算法分为三类（见第三部分）。在第三部分A节中，我们详细介绍了引入新权重参数或修改激活函数的加法算法。仅需要微调现有参数的算法被归类为选择性方法，并在第三部分B节中进行介绍。在第三部分C节中，我们探讨了可重参数化PEFT，即在训练过程中构建（低维度）原始模型参数的重参数化，同时将权重转换回以保持推理速度。此外，还存在结合上述技术的算法，我们将其归类为混合方法，并在第三部分D节中详细介绍。在第四部分中，我们还研究了进一步降低不同PEFT算法计算复杂度的策略，包括KV缓存管理、剪枝、量化和内存优化。

在第五部分中，我们将本综述的范围扩大到超出计算角度的其他潜在应用场景。具体来说，我们探索了将PEFT技术应用于不同模型架构的方法，包括LLMs（第V-A节）、Vision Transformer（第V-B节）、Vision-Language对齐模型（第V-C节）和Diffusion模型（第V-D节），以适应不同的下游任务，凸显PEFT在各种场景下的灵活性和适用性。在第VI节中，我们探讨了PEFT方法的系统设计挑战。讨论包括三个实用的PEFT部署解决方案：PEFT查询服务（第VI-B节）、分布式调优（第VI-C节）和并行PEFT调优（第VI-D节）。最后，在第VII节中，我们总结了我们的调查，并从算法和系统角度提出了一些潜在的未来方向，旨在为该领域的进一步研究和发展提供有价值的见解。

Figure 1:A content overview covered in the survey.图1：调查中涵盖的内容概述。

VII Conclusion and Future Directions结论与未来方向

In the current era dominated by large models and large datasets, PEFT stands out as a highly attractive method for efficiently adapting models to downstream tasks. This technique gains its appeal by addressing the significant challenges posed by traditional full-model fine-tuning, which often places substantial computational and data demands. This survey offers a comprehensive examination of the most recent advancements in PEFT, including algorithmic design, computational efficiency, application scenarios, and system implementation for PEFT. It offers a comprehensive taxonomy and explanation that serves as an excellent guidance and knowledge base, which enables readers of various levels and disciplines to swiftly grasp the core concepts of PEFT.

For further research on PEFT, we propose a series of possible directions from both algorithm and system perspectives, hoping to inspire more researchers to engage in further studies in these areas.

在当前由大型模型和大量数据主导的时代，PEFT作为一种高效的模型适配下游任务的方法脱颖而出。该技术通过解决传统全模型微调带来的重大挑战而获得吸引力，后者通常需要大量的计算和数据资源。本综述对PEFT的最新进展进行了全面的考察，包括算法设计、计算效率、应用场景和PEFT的系统实现。它提供了一个全面的分类和解释，作为优秀的指导和知识基础，使不同水平和学科的读者能够迅速掌握PEFT的核心概念。

对于PEFT的进一步研究，我们从算法和系统角度提出了一系列可能的方向，希望能够激发更多研究人员在这些领域进行进一步的研究。