Artificial Intelligence in Music Generation: Models, Evaluation, Applications, and Future Prospects
Research Article
Open Access
CC BY

Artificial Intelligence in Music Generation: Models, Evaluation, Applications, and Future Prospects

Tiffany Chiu 1*
1 Jericho Senior High School
*Corresponding author: tiffanychiu3000@gmail.com
Published on 26 November 2025
Volume Cover
TNS Vol.151
ISSN (Print): 2753-8826
ISSN (Online): 2753-8818
ISBN (Print): 978-1-80590-559-2
ISBN (Online): 978-1-80590-560-8
Download Cover

Abstract

Music creation through the use of artificial intelligence (AI) is an emerging and rapidly developing field. This paper presents a comprehensive review of the current state of AI music generation, covering the historical development of computer-assisted music production and AI-assisted music from early analog and digital tools to modern neural network architectures, and highlighting key developments such as MIDI, DAWs, plugins, and early algorithmic composition systems. It also examines symbolic and audio-based music representations, including MIDI, sheet music, waveforms, and spectrograms, and evaluates generative models such as GANs, LSTMs, Transformers, VAEs, and diffusion models, analyzing their various capabilities and limitations. Applications in areas such as content creation, gaming, healthcare, and marketing also demonstrate AI’s growing global impacts. This review also compares subjective, objective, and combined evaluation strategies used to assess new AI music models and addresses challenges and potential problematic areas in current studies and research. Finally, future research directions are discussed, including improved generative techniques, interdisciplinary integration, and real-time interactive systems, suggesting pathways for researchers to enhance creativity, expressiveness, and practical application in AI-assisted music production.

Keywords:

Artificial Intelligence, Music Generation, Music Evaluation, Generative Models, Music Representation

View PDF
Chiu,T. (2025). Artificial Intelligence in Music Generation: Models, Evaluation, Applications, and Future Prospects. Theoretical and Natural Science,151,53-61.

References

[1]. Pinch, T., & Trocco, F. (2004). Analog Days: The Invention and Impact of the Moog Synthesizer. Harvard University Press. 10.4159/9780674042162

[2]. Chen, Y., Huang, L., & Gou, T. (2024, September 3). Applications and Advances of Artificial Intelligence in Music Generation: A Review. arXiv: 2409.03715

[3]. Yang, Y. (2024, March 13). Analysis Of Different Types of Digital Audio Workstations. Highlights in Science Engineering and Technology, 85, 563-569. 10.54097/6vvy8z41

[4]. Manan, Verma, G., Singh, S., & Kukreti, K. (2022, December 23-24). A Review of Multimedia Processing and its Application in MIDI. 2022 2nd International Conference on Innovative Sustainable Computational Technologies (CISCT), 1-5. 10.1109/CISCT55310.2022.10046532

[5]. Xiong, Z., Wang, W., Yu, J., Lin, Y., & Wang, Z. (2023, August 26). A Comprehensive Survey for Evaluation Methodologies of AI-Generated Music. arXiv: 2308.13736

[6]. van den Oord, A., Dieleman, S., Zen, H., Simonyan, K., Vinyals, O., Graves, A., Kalchbrenner, N., Senior, A., & Kavukcuoglu, K. (2016, September 19). WaveNet: A Generative Model for Raw Audio. arXiv: 1609.03499

[7]. Wang, Y., Skerry-Ryan, R., Stanton, D., Wu, Y., Weiss, R. J., Jaitly, N., Yang, Z., Xiao, Y., Chen, Z., Bengio, S., Le, Q., Agiomyrgiannakis, Y., Clark, R., & Saurous, R. A. (2017, April 6). Tacotron: Towards End-to-End Speech Synthesis. arXiv: 1703.10135

[8]. Schneider, F., Kamal, O., Jin, Z., & Schölkopf, B. (2024, August). Moûsai: Efficient Text-to-Music Diffusion Models (L.-W. Ku, A. Martins, & V. Srikumar, Eds.). Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). arXiv.2301.11757

[9]. Huang, Q., Park, D. S., Wang, T., Denk, T. I., Ly, A., Chen, N., Zhang, Z., Zhang, Z., Yu, J., Frank, C., Engel, J., Le, Q. V., Chan, W., Chen, Z., & Han, W. (2023, March 6). Noise2Music: Text-conditioned Music Generation with Diffusion Models. arXiv.2302.03917

[10]. Donahue, C., McAuley, J., & Puckette, M. (2019, February 9). Adversarial Audio Synthesis. arXiv.1802.04208

[11]. Dong, H.-W., Hsiao, W.-Y., Yang, L.-C., & Yang, Y.-H. (2017, November 24). MuseGAN: Multi-track Sequential Generative Adversarial Networks for Symbolic Music Generation and Accompaniment. arXiv.1709.06298

[12]. Liang, F. (2016, August). BachBot: Automatic composition in the style of Bach chorales. https: //www.mlmi.eng.cam.ac.uk/files/feynman_liang_8224771_assignsubmission_file_liangfeynmanthesis.pdf

[13]. Hadjeres, G., Pachet, F., & Nielsen, F. (2017, June 17). DeepBach: a Steerable Model for Bach Chorales Generation. arXiv.1612.01010

[14]. Agostinelli, A., Denk, T. I., Borsos, Z., Engel, J., Verzetti, M., Caillon, A., Huang, Q., Jansen, A., Roberts, A., Tagliasacchi, M., Sharifi, M., Zeghidour, N., & Frank, C. (2023, January 26). MusicLM: Generating Music From Text. arXiv: 2301.11325

[15]. Huang, C.-Z. A., Vaswani, A., Uszkoreit, J., Shazeer, N., Simon, I., Hawthorne, C., Dai, A. M., Hoffman, M. D., Dinculescu, M., & Eck, D. (2018, December 12). Music Transformer. arXiv: 1809.04281

[16]. Brunner, G., Konrad, A., Wang, Y., & Wattenhofer, R. (2018, September 20). MIDI-VAE: Modeling Dynamics and Instrumentation of Music with Applications to Style Transfer. arXiv: 1809.07600

[17]. Engel, J., Resnick, C., Roberts, A., Dieleman, S., Eck, D., Simonyan, K., & Norouzi, M. (2017, April 5). Neural Audio Synthesis of Musical Notes with WaveNet Autoencoders. arXiv: 1704.01279

[18]. Yang, L.-C., & Lerch, A. (2018, November 3). On the evaluation of generative models in music. Neural Computing and Applications, 32, 4773–4784. 10.1007/s00521-018-3849-7

[19]. Huang, J., Wang, J.-C., Smith, J. B. L., Song, X., & Wang, Y. (2021, March 26). Modeling the Compatibility of Stem Tracks to Generate Music Mashups. arXiv: 2103.14208

[20]. Gerstgrasser, M., Schaeffer, R., Dey, A., Rafailov, R., Sleight, H., Hughes, J., Korabk, T., Agrawal, R., Pai, D., Gromov, A., Roberts, D. A., Yang, D., Donoho, D. L., & Koyejo, S. (2024, April 29). Is Model Collapse Inevitable? Breaking the Curse of Recursion by Accumulating Real and Synthetic Data. arXiv: 2404.01413

Cite this article

Chiu,T. (2025). Artificial Intelligence in Music Generation: Models, Evaluation, Applications, and Future Prospects. Theoretical and Natural Science,151,53-61.

Data availability

The datasets used and/or analyzed during the current study will be available from the authors upon reasonable request.

About volume

Volume title: Proceedings of CONF-CIAP 2026 Symposium: Applied Mathematics and Statistics

ISBN: 978-1-80590-559-2(Print) / 978-1-80590-560-8(Online)
Editor: Marwan Omar
Conference date: 27 January 2026
Series: Theoretical and Natural Science
Volume number: Vol.151
ISSN: 2753-8818(Print) / 2753-8826(Online)