Astock: A New Dataset and Automated Stock Trading based on Stock-specific News Analyzing Model

Astock: A New Dataset and Automated Stock Trading based on Stock-specific News Analyzing Model

Jinan Zou
Haiyao Cao
Lingqiao Liu
Yuhao Lin
Ehsan Abbasnejad
Javen Qinfeng Shi
Published on 6/14/2022
Equities
Stocks
China
AI
Machine learning
Sentiment
Stock picking
Alternative data
Event Driven
Trading earnings

This paper introduces Astock, a comprehensive platform and dataset designed to systematically study natural language processing (NLP)-aided automated stock trading algorithms. Unlike previous work, Astock provides stock-specific financial news and various stock factors, enabling more realistic evaluation through financial-relevant metrics such as annualized rate of return and maximum drawdown. The platform facilitates the development and assessment of trading strategies by integrating textual and numerical data, addressing gaps in existing datasets that often lack specificity or comprehensive financial context.

Technically, the authors propose a novel method centered on semantic role labeling pooling (SRLP), which leverages semantic role labeling to create compact representations of news paragraphs. This approach is combined with other stock factors to enhance prediction accuracy for trading decisions. Additionally, a self-supervised learning strategy based on SRLP is introduced to improve the system's generalization to out-of-distribution data. Experimental results demonstrate that the proposed method outperforms all baselines in terms of annualized rate of return and achieves better maximum drawdown compared to the CSI300 and XIN9 indices in real trading scenarios. The Astock dataset and code are publicly available, contributing to the field by providing a robust resource for future research in NLP-driven financial decision-making.

Highlights

  • 1Introduction of Astock dataset with stock-specific news and financial factors
  • 2Development of semantic role labeling pooling (SRLP) for compact news representation
  • 3Integration of SRLP with stock factors for enhanced prediction accuracy
  • 4Proposal of self-supervised learning to improve out-of-distribution generalization
  • 5Demonstration of superior performance in real trading metrics compared to baselines and indices

Methods

  • M
    Semantic role labeling pooling (SRLP) for feature extraction from news text
  • M
    Integration of SRLP with various stock factors for predictive modeling
  • M
    Self-supervised learning strategy to enhance generalization
  • M
    Evaluation using financial-relevant metrics such as annualized rate of return and maximum drawdown

Results

  • R
    Proposed method achieves higher annualized rate of return than all baselines
  • R
    Outperforms CSI300 and XIN9 indices in maximum drawdown on real trading
  • R
    SRLP enables effective representation learning from news paragraphs
  • R
    Self-supervised learning improves out-of-distribution performance
  • R
    Astock dataset and platform support realistic evaluation of NLP-aided trading algorithms
0/5

Analyze Paper

Generate insights from "Astock: A New Dataset and Automated Stock Trading based on S...".

Suggested Actions