Astock: A New Dataset and Automated Stock Trading based on Stock-specific News Analyzing Model
Astock: A New Dataset and Automated Stock Trading based on Stock-specific News Analyzing Model
This paper introduces Astock, a comprehensive platform and dataset designed to systematically study natural language processing (NLP)-aided automated stock trading algorithms. Unlike previous work, Astock provides stock-specific financial news and various stock factors, enabling more realistic evaluation through financial-relevant metrics such as annualized rate of return and maximum drawdown. The platform facilitates the development and assessment of trading strategies by integrating textual and numerical data, addressing gaps in existing datasets that often lack specificity or comprehensive financial context.
Technically, the authors propose a novel method centered on semantic role labeling pooling (SRLP), which leverages semantic role labeling to create compact representations of news paragraphs. This approach is combined with other stock factors to enhance prediction accuracy for trading decisions. Additionally, a self-supervised learning strategy based on SRLP is introduced to improve the system's generalization to out-of-distribution data. Experimental results demonstrate that the proposed method outperforms all baselines in terms of annualized rate of return and achieves better maximum drawdown compared to the CSI300 and XIN9 indices in real trading scenarios. The Astock dataset and code are publicly available, contributing to the field by providing a robust resource for future research in NLP-driven financial decision-making.
Highlights
- 1Introduction of Astock dataset with stock-specific news and financial factors
- 2Development of semantic role labeling pooling (SRLP) for compact news representation
- 3Integration of SRLP with stock factors for enhanced prediction accuracy
- 4Proposal of self-supervised learning to improve out-of-distribution generalization
- 5Demonstration of superior performance in real trading metrics compared to baselines and indices
Methods
- MSemantic role labeling pooling (SRLP) for feature extraction from news text
- MIntegration of SRLP with various stock factors for predictive modeling
- MSelf-supervised learning strategy to enhance generalization
- MEvaluation using financial-relevant metrics such as annualized rate of return and maximum drawdown
Results
- RProposed method achieves higher annualized rate of return than all baselines
- ROutperforms CSI300 and XIN9 indices in maximum drawdown on real trading
- RSRLP enables effective representation learning from news paragraphs
- RSelf-supervised learning improves out-of-distribution performance
- RAstock dataset and platform support realistic evaluation of NLP-aided trading algorithms
Analyze Paper
Generate insights from "Astock: A New Dataset and Automated Stock Trading based on S...".