References
[1]. Ouyang, L., Wu, J., Jiang, X., Almeida, D., Wainwright, C. L., Mishkin, P., Zhang, C., Agarwal, S., Slama, K., Ray, A., Schulman, J., Hilton, J., Kelton, F., Miller, L., Simens, M., Askell, A., Welinder, P., Christiano, P., Leike, J., & Lowe, R. (2022). Training language models to follow instructions with human feedback. arXiv preprint arXiv: 2203.02155. https: //arxiv.org/abs/2203.02155
[2]. Schulman, J., Wolski, F., Dhariwal, P., Radford, A., & Klimov, O. (2017). Proximal Policy Optimization Algorithms. ArXiv, abs/1707.06347.
[3]. Rafailov, R., Sharma, A., Mitchell, E., Ermon, S., Manning, C. D., & Finn, C. (2024). Direct Preference Optimization: Your Language Model is Secretly a Reward Model. arXiv preprint arXiv: 2305.18290. https: //arxiv.org/abs/2305.18290
[4]. Du, Y., Li, Z., Cheng, P., Chen, Z., Xie, Y., Wan, X., & Gao, A. (2025). Simplify RLHF as Reward-Weighted SFT: A Variational Method. arXiv preprint arXiv: 2502.11026. https: //arxiv.org/abs/2502.11026
[5]. Iftikhar, Z., Xiao, A., Ransom, S., Huang, J., & Suresh, H. (2025). How LLM Counselors Violate Ethical Standards in Mental Health Practice: A Practitioner-Informed Framework.Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society, 8(2), 1311-1323. https: //doi.org/10.1609/aies.v8i2.36632
[6]. Ziebart, B. D. , Maas, A. L. , Bagnell, J. A. , & Dey, A. K. . (2008). Maximum Entropy Inverse Reinforcement Learning. Proceedings of the Twenty-Third AAAI Conference on Artificial Intelligence, AAAI 2008, Chicago, Illinois, USA, July 13-17, 2008.
[7]. Wang, X., Zhou, Y., & Zhou, G. (2025). The Application and Ethical Implication of Generative AI in Mental Health: Systematic Review.JMIR mental health, 12, e70610. https: //doi.org/10.2196/70610
[8]. Xie, H., Chen, Y., Xing, X., Lin, J., & Xu, X. (2024). PsyDT: Using LLMs to Construct the Digital Twin of Psychological Counselor with Personalized Counseling Style for Psychological Counseling. ArXiv, abs/2412.13660.
[9]. Kim, Y., Choi, C. H., Cho, S., Sohn, J. Y., & Kim, B. H. (2025). Aligning large language models for cognitive behavioral therapy: a proof-of-concept study.Frontiers in psychiatry, 16, 1583739. https: //doi.org/10.3389/fpsyt.2025.1583739
[10]. Weidinger, L., Mellor, J.F., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S.M., Hawkins, W.T., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., & Gabriel, I. (2021). Ethical and social risks of harm from Language Models. ArXiv, abs/2112.04359.
[11]. Lin, S.C., Hilton, J., & Evans, O. (2021). TruthfulQA: Measuring How Models Mimic Human Falsehoods. Annual Meeting of the Association for Computational Linguistics.
[12]. Yang, Q.A., Yang, B., Zhang, B., Hui, B., Zheng, B., Yu, B., Li, C., Liu, D., Huang, F., Dong, G., Wei, H., Lin, H., Yang, J., Tu, J., Zhang, J., Yang, J., Yang, J., Zhou, J., Lin, J., Dang, K., Lu, K., Bao, K., Yang, K., Yu, L., Li, M., Xue, M., Zhang, P., Zhu, Q., Men, R., Lin, R., Li, T., Xia, T., Ren, X., Ren, X., Fan, Y., Su, Y., Zhang, Y., Wan, Y., Liu, Y., Cui, Z., Zhang, Z., Qiu, Z., Quan, S., & Wang, Z. (2024). Qwen2.5 Technical Report. ArXiv, abs/2412.15115.
[13]. Chen, Y., Xing, X., Lin, J., Zheng, H., Wang, Z., Liu, Q., & Xu, X. (2023). SoulChat: Improving LLMs' empathy, listening, and comfort abilities through fine-tuning with multi-turn empathy conversations. arXiv preprint arXiv: 2311.00273. https: //arxiv.org/abs/2311.00273