Mitigating Forgetting in Low Rank Adaptation
Mitigating Forgetting in Low Rank Adaptation
This paper addresses the problem of catastrophic forgetting in Low-Rank Adaptation (LoRA), a popular parameter-efficient fine-tuning method for large language models. The authors propose a novel regularization technique that penalizes significant deviations from the pre-trained weights during fine-tuning, thereby preserving previously learned knowledge while adapting to new tasks. The method is computationally efficient, adding minimal overhead to the standard LoRA training process.
Extensive experiments on various NLP benchmarks, including GLUE and SuperGLUE, demonstrate that the proposed approach significantly reduces forgetting compared to standard LoRA and other parameter-efficient methods. The method achieves state-of-the-art performance on target tasks while maintaining high accuracy on pre-trained tasks, making it particularly suitable for continual learning scenarios. The results show that the regularization term effectively balances adaptation and retention, offering a practical solution for deploying large models in dynamic environments.
Highlights
- 1Proposes a novel method to mitigate catastrophic forgetting in Low-Rank Adaptation (LoRA) for fine-tuning large language models.
- 2Introduces a regularization term that penalizes changes to pre-trained weights, preserving knowledge while adapting to new tasks.
- 3Achieves state-of-the-art performance on multiple NLP benchmarks with minimal computational overhead.
- 4Demonstrates effectiveness across various model sizes and tasks, including text classification and generation.
Methods
- MLow-Rank Adaptation (LoRA) with a forgetting mitigation regularization term.
- MGradient-based optimization with a combined loss function including task loss and regularization loss.
- MEmpirical evaluation on benchmark datasets (e.g., GLUE, SuperGLUE) and comparison with baseline LoRA and full fine-tuning.
Results
- RThe proposed method reduces forgetting by up to 30% compared to standard LoRA on continual learning tasks.
- RAchieves comparable or better performance than full fine-tuning on target tasks while using only 0.1% of the parameters.
- RMaintains higher accuracy on pre-trained tasks after fine-tuning, indicating reduced catastrophic forgetting.
- ROutperforms other parameter-efficient fine-tuning methods like Adapters and Prefix Tuning in forgetting metrics.
Analyze Paper
Generate insights from "Mitigating Forgetting in Low Rank Adaptation".