Mitigating Forgetting in Low Rank Adaptation

Mitigating Forgetting in Low Rank Adaptation

Joanna Sliwa
Frank Schneider
Philipp Hennig
Jose Miguel Hernandez-Lobato
Published on 12/19/2025
Cross-asset
AI
LLM
Machine learning

This paper addresses the problem of catastrophic forgetting in Low-Rank Adaptation (LoRA), a popular parameter-efficient fine-tuning method for large language models. The authors propose a novel regularization technique that penalizes significant deviations from the pre-trained weights during fine-tuning, thereby preserving previously learned knowledge while adapting to new tasks. The method is computationally efficient, adding minimal overhead to the standard LoRA training process.

Extensive experiments on various NLP benchmarks, including GLUE and SuperGLUE, demonstrate that the proposed approach significantly reduces forgetting compared to standard LoRA and other parameter-efficient methods. The method achieves state-of-the-art performance on target tasks while maintaining high accuracy on pre-trained tasks, making it particularly suitable for continual learning scenarios. The results show that the regularization term effectively balances adaptation and retention, offering a practical solution for deploying large models in dynamic environments.

Highlights

  • 1Proposes a novel method to mitigate catastrophic forgetting in Low-Rank Adaptation (LoRA) for fine-tuning large language models.
  • 2Introduces a regularization term that penalizes changes to pre-trained weights, preserving knowledge while adapting to new tasks.
  • 3Achieves state-of-the-art performance on multiple NLP benchmarks with minimal computational overhead.
  • 4Demonstrates effectiveness across various model sizes and tasks, including text classification and generation.

Methods

  • M
    Low-Rank Adaptation (LoRA) with a forgetting mitigation regularization term.
  • M
    Gradient-based optimization with a combined loss function including task loss and regularization loss.
  • M
    Empirical evaluation on benchmark datasets (e.g., GLUE, SuperGLUE) and comparison with baseline LoRA and full fine-tuning.

Results

  • R
    The proposed method reduces forgetting by up to 30% compared to standard LoRA on continual learning tasks.
  • R
    Achieves comparable or better performance than full fine-tuning on target tasks while using only 0.1% of the parameters.
  • R
    Maintains higher accuracy on pre-trained tasks after fine-tuning, indicating reduced catastrophic forgetting.
  • R
    Outperforms other parameter-efficient fine-tuning methods like Adapters and Prefix Tuning in forgetting metrics.
0/5

Analyze Paper

Generate insights from "Mitigating Forgetting in Low Rank Adaptation".

Suggested Actions