Cognitive Alpha Mining via LLM-Driven Code-Based Evolution
Cognitive Alpha Mining via LLM-Driven Code-Based Evolution
The paper introduces CogAlpha, a novel framework for automated alpha mining that combines large language model (LLM) reasoning with evolutionary search. Unlike prior methods that rely on formulaic or neural approaches with limited search scope, CogAlpha represents alphas as executable code and uses a seven-level agent hierarchy to explore diverse financial concepts—from market structure to geometric patterns. A multi-agent quality checker ensures code validity, logical consistency, and economic interpretability. The framework then applies thinking evolution, where LLMs perform mutation and crossover operations on alpha codes, guided by fitness metrics (IC, ICIR, RankIC, RankICIR, Mutual Information) and adaptive generation that incorporates feedback from previous iterations.
Experiments on five datasets from three stock markets (China, US, Hong Kong) demonstrate that CogAlpha consistently discovers alphas with superior predictive accuracy, robustness, and interpretability compared to 21 baselines including machine learning models, deep learning models, existing alpha libraries, and other LLM-based methods. On the CSI300 dataset, CogAlpha achieves an IC of 0.0591 and IR of 1.8999, significantly outperforming all competitors. The framework also produces interpretable alphas with detailed comments explaining their economic rationale, and shows strong generalization across different markets, training methods, and prediction horizons. The work opens a new direction for cognitive alpha mining by aligning evolutionary optimization with LLM-based reasoning.
Highlights
- 1Introduces Cognitive Alpha Mining concept for automated, robust, and explainable alpha discovery.
- 2Proposes CogAlpha framework combining LLM-driven reasoning with evolutionary search via code-level alpha representation.
- 3Features a Seven-Level Agent Hierarchy and Multi-Agent Quality Checker to ensure validity and diversity.
- 4Achieves superior predictive accuracy, robustness, and generalization across 5 datasets from 3 stock markets.
- 5Demonstrates interpretable alpha generation with detailed comments and evolutionary refinement.
Methods
- MSeven-Level Agent Hierarchy: 21 agents organized into 7 levels exploring market structure, risk, price-volume dynamics, etc.
- MMulti-Agent Quality Checker: Judge, Logic Improvement, Code Quality, and Code Repair agents to validate and fix alpha codes.
- MThinking Evolution: Mutation and crossover operations in natural language space to iteratively refine alpha candidates.
- MFitness Evaluation: Uses IC, ICIR, RankIC, RankICIR, and Mutual Information with percentile-based thresholds for selection.
Results
- RCogAlpha outperforms 21 baseline methods on CSI300 with IC=0.0591, RankIC=0.0814, ICIR=0.3410, RankICIR=0.4350, AER=0.1639, IR=1.8999.
- RAblation study confirms each component (Adaptive Generation, Diversified Guidance, Agent Hierarchy, Thinking Evolution) contributes to performance.
- RGenerated alphas achieve absolute IC > 0.05 and RankIC > 0.07 after full evolution cycle.
- RGeneralizes well across CSI300, CSI500, S&P500, HSI, HSCI with different training methods and horizons.
- RThreshold pair (65,80) yields best performance by balancing exploration and exploitation.
Analyze Paper
Generate insights from "Cognitive Alpha Mining via LLM-Driven Code-Based Evolution".