In-Context Learning Under Regime Change
In-Context Learning Under Regime Change
This paper studies how transformers handle non-stationary sequences with abrupt regime changes, formalizing the problem as in-context change-point detection. The authors provide constructive theory showing that transformers can approximate the Bayesian model-averaged predictor for piecewise-linear tasks, with model complexity (depth, width, attention heads) depending on the level of side information about the change point. Specifically, knowing the exact change point reduces the candidate set size and thus the required attention heads, while partial information (e.g., support) requires more heads. Synthetic experiments on linear regression and linear dynamical systems confirm that trained transformers match optimal baselines (oracle least-squares when informed, BMA when uninformed). Real-world experiments on infectious disease forecasting (with policy changes) and financial volatility forecasting (around FOMC announcements) show that encoding change-point information via positional encoding improves pretrained foundation model performance without retraining, with up to 25% MAE reduction in disease forecasting. The work bridges classical change-point detection and modern in-context learning, offering practical methods for deploying transformers in non-stationary environments.
Highlights
- 1Formalizes in-context change-point detection for transformers under non-stationary sequences.
- 2Provides constructive theory showing transformer complexity depends on change-point information level.
- 3Validates theory with synthetic experiments where trained transformers match optimal baselines.
- 4Demonstrates real-world improvement by encoding change-point information into pretrained models.
Methods
- MBayesian model averaging (BMA) for change-point adaptation.
- MTransformer construction with positional encoding to communicate change-point side information.
- MSynthetic experiments on piecewise-linear regression and dynamical systems.
- MReal-world experiments on infectious disease and financial volatility forecasting.
Results
- RTransformers can approximate BMA predictor for piecewise-linear change-point problems.
- RModel complexity (attention heads) scales with candidate change-point set size.
- RTrained transformers match oracle least-squares and BMA baselines in synthetic tasks.
- RPositional encoding of change-point information reduces MAE by ~25% in disease forecasting.
- RLinear positional encoding improves financial volatility forecasting around FOMC events.
Analyze Paper
Generate insights from "In-Context Learning Under Regime Change".