Testing LLM Generated Factors for Pricing Cross Sectional Returns

Yixuan Liu; Fei  Ge

doi:10.54254/2753-8818/2025.28722

Theoretical and Natural ScienceOpen access

Testing LLM Generated Factors for Pricing Cross Sectional Returns

Research Article

Open Access

Testing LLM Generated Factors for Pricing Cross Sectional Returns

Yixuan Liu ^1* Fei Ge ²

¹ The Australian National University, Canberra, Australia

² Swansea University, Swansea, UK

^*Corresponding author: rara481846778@gmail.com

Published on 28 October 2025

TNS Vol.148

ISSN (Print): 2753-8826

ISSN (Online): 2753-8818

ISBN (Print): 978-1-80590-499-1

ISBN (Online): 978-1-80590-500-4

Download Cover

Abstract

The emergence of large language models (LLMs) has introduced a novel methodology for constructing factors in asset pricing. Whereas conventional approaches emphasize financial ratios or price-based indicators, LLMs allow for the systematic conversion of unstructured financial text into economically interpretable constructs that may capture latent risk perceptions. This study evaluates the pricing ability of LLM-generated factors in explaining U.S. equity cross-sectional returns from 2000 to 2024. Using a dataset of 220,000 earnings call transcripts, 180,000 10-K filings, and 1.2 million analyst reports, we extract 68 candidate factors through GPT-4 prompted financial text analysis. These include tone consistency indices, ESG disclosure emphases, governance accountability markers, and forward-looking orientation metrics. Econometric testing employs Fama-MacBeth regressions, generalized method of moments (GMM), and Bayesian shrinkage with horseshoe priors. The LLM-derived factors improve adjusted R² by +0.034 relative to Fama-French 5-factor benchmarks and reduce mean absolute pricing errors from 0.812 to 0.545. Out-of-sample Sharpe ratios of factor-mimicking portfolios rise from 0.42 (FF5) to 0.61 (LLM factors), and Hansen-Jagannathan distances fall by -0.052. Robustness checks through adversarial textual perturbations, rolling-window sub-sampling, and sectoral decomposition confirm stability, with persistent contributions from narrative consistency, forward-looking ratios, and ESG-litigation emphasis. Findings indicate that LLMs provide not only interpretable but also quantitatively robust innovations in factor design, marking a methodological shift for empirical asset pricing research.

Keywords:

Large Language Models, Asset Pricing, Factor Construction, Cross-Sectional Returns, Bayesian Shrinkage

View PDF

References

[1]. Cheng, Y., & Tang, K. (2024). GPT's idea of stock factors. Quantitative Finance, 24(9), 1301-1326.

[2]. Thimme, J., & Klaus, V. (2025). Understanding Asset Pricing Factors. Available at SSRN.

[3]. Lopez-Lira, A., & Tang, Y. (2023). Can chatgpt forecast stock price movements? return predictability and large language models. arXiv preprint arXiv: 2304.07619.

[4]. LoGrasso, M. F. (2024). Could ChatGPT Have Earned Abnormal Returns?. Available at SSRN 4758402.

[5]. Kargarzadeh, A. (2024). Developing and backtesting a trading strategy using large language models, macroeconomic and technical indicators. Mémoire de master, Imperial College London.

[6]. Li, X., Shen, X., Zeng, Y., Xing, X., & Xu, J. (2024, May). Finreport: Explainable stock earnings forecasting via news factor analyzing model. In Companion Proceedings of the ACM Web Conference 2024 (pp. 319-327).

[7]. Offutt, J., & Xie, Y. (2025). Quantifying legal risk with Large Language Models: A text-based investment signal. Journal of High School Science, 9(3), 486-515.

[8]. Xue, H., Liu, C., Zhang, C., Chen, Y., Zong, A., Wu, Z., ... & Su, J. (2025, July). LLM-Enhanced Feature Engineering for Multi-factor Electricity Price Predictions. In International Conference on Intelligent Computing (pp. 89-100). Singapore: Springer Nature Singapore.

[9]. Cohen, G., Aiche, A., & Eichel, R. (2025). Artificial Intelligence Models for Predicting Stock Returns Using Fundamental, Technical, and Entropy-Based Strategies: A Semantic-Augmented Hybrid Approach. Entropy, 27(6), 550.

[10]. Xiao, Y., Sun, E., Chen, T., Wu, F., Luo, D., & Wang, W. (2025). Trading-R1: Financial Trading with LLM Reasoning via Reinforcement Learning. arXiv preprint arXiv: 2509.11420.

[11]. Koa, K. J., Ma, Y., Ng, R., & Chua, T. S. (2024, May). Learning to generate explainable stock predictions using self-reflective large language models. In Proceedings of the ACM Web Conference 2024 (pp. 4304-4315).

References

[1]. Cheng, Y., & Tang, K. (2024). GPT's idea of stock factors. Quantitative Finance, 24(9), 1301-1326.

[2]. Thimme, J., & Klaus, V. (2025). Understanding Asset Pricing Factors. Available at SSRN.

[3]. Lopez-Lira, A., & Tang, Y. (2023). Can chatgpt forecast stock price movements? return predictability and large language models. arXiv preprint arXiv: 2304.07619.

[4]. LoGrasso, M. F. (2024). Could ChatGPT Have Earned Abnormal Returns?. Available at SSRN 4758402.

[5]. Kargarzadeh, A. (2024). Developing and backtesting a trading strategy using large language models, macroeconomic and technical indicators. Mémoire de master, Imperial College London.

[7]. Offutt, J., & Xie, Y. (2025). Quantifying legal risk with Large Language Models: A text-based investment signal. Journal of High School Science, 9(3), 486-515.

[10]. Xiao, Y., Sun, E., Chen, T., Wu, F., Luo, D., & Wang, W. (2025). Trading-R1: Financial Trading with LLM Reasoning via Reinforcement Learning. arXiv preprint arXiv: 2509.11420.

Cite this article

Liu,Y.;Ge,F. (2025). Testing LLM Generated Factors for Pricing Cross Sectional Returns. Theoretical and Natural Science,148,1-6.

Data availability

The datasets used and/or analyzed during the current study will be available from the authors upon reasonable request.

About volume

Volume title: Proceedings of the 3rd International Conference on Applied Physics and Mathematical Modeling

ISBN: 978-1-80590-499-1(Print) / 978-1-80590-500-4(Online)

Editor: Marwan Omar

Conference website: https://www.confapmm.org/

Conference date: 31 October 2025

Series: Theoretical and Natural Science

Volume number: Vol.148

ISSN: 2753-8818(Print) / 2753-8826(Online)

© 2024 by the author(s). Licensee EWA Publishing, Oxford, UK. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license. Authors who publish this series agree to the following terms:

1. Authors retain copyright and grant the series right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this series.

2. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the series's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this series.

3. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See Open access policy for details).