Cognitive Alpha Mining via LLM-Driven Code-Based Evolution

Cognitive Alpha Mining via LLM-Driven Code-Based Evolution

Fengyuan Liu
Yi Huang
Sichun Luo
Yuqi Wang
Yazheng Yang
Xinye Li
Zefa Hu
Junlan Feng
Qi Liu
Published on 11/24/2025
Equities
Stocks
AI
LLM
Machine learning
Factor investing
Stock picking
Alternative data

The paper introduces CogAlpha, a novel framework for automated alpha mining that combines large language model (LLM) reasoning with evolutionary search. Unlike prior methods that rely on formulaic or neural approaches with limited search scope, CogAlpha represents alphas as executable code and uses a seven-level agent hierarchy to explore diverse financial concepts—from market structure to geometric patterns. A multi-agent quality checker ensures code validity, logical consistency, and economic interpretability. The framework then applies thinking evolution, where LLMs perform mutation and crossover operations on alpha codes, guided by fitness metrics (IC, ICIR, RankIC, RankICIR, Mutual Information) and adaptive generation that incorporates feedback from previous iterations.

Experiments on five datasets from three stock markets (China, US, Hong Kong) demonstrate that CogAlpha consistently discovers alphas with superior predictive accuracy, robustness, and interpretability compared to 21 baselines including machine learning models, deep learning models, existing alpha libraries, and other LLM-based methods. On the CSI300 dataset, CogAlpha achieves an IC of 0.0591 and IR of 1.8999, significantly outperforming all competitors. The framework also produces interpretable alphas with detailed comments explaining their economic rationale, and shows strong generalization across different markets, training methods, and prediction horizons. The work opens a new direction for cognitive alpha mining by aligning evolutionary optimization with LLM-based reasoning.

Highlights

  • 1Introduces Cognitive Alpha Mining concept for automated, robust, and explainable alpha discovery.
  • 2Proposes CogAlpha framework combining LLM-driven reasoning with evolutionary search via code-level alpha representation.
  • 3Features a Seven-Level Agent Hierarchy and Multi-Agent Quality Checker to ensure validity and diversity.
  • 4Achieves superior predictive accuracy, robustness, and generalization across 5 datasets from 3 stock markets.
  • 5Demonstrates interpretable alpha generation with detailed comments and evolutionary refinement.

Methods

  • M
    Seven-Level Agent Hierarchy: 21 agents organized into 7 levels exploring market structure, risk, price-volume dynamics, etc.
  • M
    Multi-Agent Quality Checker: Judge, Logic Improvement, Code Quality, and Code Repair agents to validate and fix alpha codes.
  • M
    Thinking Evolution: Mutation and crossover operations in natural language space to iteratively refine alpha candidates.
  • M
    Fitness Evaluation: Uses IC, ICIR, RankIC, RankICIR, and Mutual Information with percentile-based thresholds for selection.

Results

  • R
    CogAlpha outperforms 21 baseline methods on CSI300 with IC=0.0591, RankIC=0.0814, ICIR=0.3410, RankICIR=0.4350, AER=0.1639, IR=1.8999.
  • R
    Ablation study confirms each component (Adaptive Generation, Diversified Guidance, Agent Hierarchy, Thinking Evolution) contributes to performance.
  • R
    Generated alphas achieve absolute IC > 0.05 and RankIC > 0.07 after full evolution cycle.
  • R
    Generalizes well across CSI300, CSI500, S&P500, HSI, HSCI with different training methods and horizons.
  • R
    Threshold pair (65,80) yields best performance by balancing exploration and exploitation.
0/5

Analyze Paper

Generate insights from "Cognitive Alpha Mining via LLM-Driven Code-Based Evolution".

Suggested Actions