Empowering LLM-based Agents: Methods and Challenges in Tool Use

Xinyue Du

doi:10.54254/2755-2721/2026.TJ28954

Applied and Computational EngineeringOpen access

Empowering LLM-based Agents: Methods and Challenges in Tool Use

Research Article

Open Access

Empowering LLM-based Agents: Methods and Challenges in Tool Use

Xinyue Du ^1*

¹ School of Information Science and Engineering, East China University of Science and Technology (ECUST), Shanghai, China, 200237

^*Corresponding author: xiaoduxiaodu09@gmail.com

Published on 5 November 2025

ACE Vol.203

ISSN (Print): 2755-273X

ISSN (Online): 2755-2721

ISBN (Print): 978-1-80590-515-8

ISBN (Online): 978-1-80590-516-5

Download Cover

Abstract

The emergence of Large Language Model (LLM)-based agents marks a significant step towards more capable Artificial Intelligence. However, the effectiveness of these agents is fundamentally constrained by the static nature of their internal knowledge. Tool use has become a critical paradigm to overcome these limitations, enabling agents to interact with dynamic data, execute complex computations, and act upon the world. This paper provides a comprehensive survey of the methods, challenges, and future directions in empowering LLM-based agents with tool-use capabilities. Through a systematic literature review, we synthesized the current state of the art, charting the evolution from foundational agent architectures and core invocation mechanisms like function calling to advanced strategies such as dynamic tool retrieval and autonomous tool creation. Our analysis revealed several critical challenges that impede the deployment of robust agents, including knowledge conflicts between internal priors and external evidence, significant performance degradation in long-context scenarios, non-monotonic scaling behaviors in compound systems, and novel security vulnerabilities. By mapping the current research landscape and identifying these key obstacles, this survey proposes a research agenda to guide future efforts in building more capable, secure, and reliable AI agents.

Keywords:

Large Language Models, AI Agents, Tool Use, Function Calling

View PDF

References

[1]. Chen, H., Liu, Q., Sun, Z. (2023) The rise and potential of large language model-based agents: a survey. Science China – Information Sciences, 66: 1–20.

[2]. Wang, Y., Zhao, M., Li, K. (2023) A survey on large language model based autonomous agents. Frontiers of Computer Science, 17: 1–18.

[3]. Zhang, H., Lin, J., Wang, X. (2024) LLM With Tools: A Survey. arXiv preprint.

[4]. Sun, Y., He, Z., Li, J. (2024) LLMs Working in Harmony: A Survey on the Technological Aspects of Building Effective LLM-Based Multi Agent Systems. arXiv preprint.

[5]. Zhang, X., Li, Y., Wang, J. (2024) A study on classification based concurrent API calls and optimal model combination for tool augmented LLMs for AI agent. Scientific Reports, 14: 1–14.

[6]. Li, J., Zhao, K., Hu, F. (2024) Achieving Tool Calling Functionality in LLMs Using Only Prompt Engineering Without Fine-Tuning. arXiv preprint.

[7]. Gao, L., Chen, H., Xu, W. (2024) Granite-Function Calling Model: Introducing Function Calling Abilities via Multi-task Learning of Granular Tasks. arXiv preprint.

[8]. Wang, Y., Zhang, Z., Xu, J. (2025) Small Models, Big Tasks: An Exploratory Empirical Study on Small Language Models for Function Calling. arXiv preprint.

[9]. Johnson, M., Lee, D., Parker, S. (2025) eidos: A modular approach to external function integration in LLMs. SoftwareX, 24: 101–110.

[10]. Li, D., Chen, J., Yang, Q. (2024) Asynchronous LLM Function Calling. arXiv preprint.

[11]. Kumar, S., Patel, R., Singh, A. (2024) Simple Action Model: Enabling LLM to Sequential Function Calling Tool Chain. Proceedings of the International Conference on Advancement in Renewable Energy and Intelligent Systems (AREIS): 1–8.

[12]. Huang, P., Zhou, J., Wang, T. (2025) ODIA: Oriented Distillation for Inline Acceleration of LLM-based Function Calling. arXiv preprint.

[13]. Zhao, R., Chen, Q., Fang, L. (2024) Enhancing Function-Calling Capabilities in LLMs: Strategies for Prompt Formats, Data Integration, and Multilingual Translation. arXiv preprint.

[14]. Lee, C., Park, J., Smith, A. (2024) Benchmarking Floworks against OpenAI & Anthropic: A Novel Framework for Enhanced LLM Function Calling. arXiv preprint.

[15]. Tan, R., Luo, M., Wei, H. (2024) ToolACE: Winning the Points of LLM Function Calling. arXiv preprint.

[16]. Wang, F., Zhang, H., Zhao, K. (2024) Improving Small-Scale Large Language Models Function Calling for Reasoning Tasks. arXiv preprint.

[17]. Zhou, T., Wang, Y., Liu, Q. (2023) MetaTool Benchmark for Large Language Models: Deciding Whether to Use Tools and Which to Use. arXiv preprint.

[18]. Fang, Z., Liu, H., Chen, J. (2025) Meta-Tool: Unleash Open-World Function Calling Capabilities of General-Purpose Large Language Models. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (ACL, Long Papers): 1481–1495.

[19]. Schick, T., Dwivedi-Yu, J., Zellers, R. (2023) Large Language Models as Tool Makers. arXiv preprint.

[20]. Li, X., Zhou, P., Tang, J. (2023) CREATOR: Tool Creation for Disentangling Abstract and Concrete Reasoning of Large Language Models. arXiv preprint.

[21]. Yang, F., Chen, R., Xu, H. (2023) CRAFT: Customizing LLMs by Creating and Retrieving from Specialized Toolsets. arXiv preprint.

[22]. Anonymous. (2024) From RAG to Multi-Agent Systems: A Survey of Modern Approaches in LLM Development. arXiv preprint.

[23]. Gupta, R., Feng, Y., Wang, L. (2024) TroVE: Inducing Verifiable and Efficient Toolboxes for Solving Programmatic Tasks. arXiv preprint.

[24]. Xu, M., Zhang, Y., Chen, H. (2024) ClashEval: Quantifying the tug-of-war between an LLM's internal prior and external evidence. arXiv preprint.

[25]. Zhao, L., Lin, H., Wu, P. (2024) Blinded by Generated Contexts: How Language Models Merge Generated and Retrieved Contexts When Knowledge Conflicts? arXiv preprint.

[26]. Sun, Z., Gao, Y., Li, P. (2025) ComplexFuncBench: Exploring Multi-Step and Constrained Function Calling under Long-Context Scenario. arXiv preprint.

[27]. Wu, T., Lin, X., Huang, Y. (2025) LongFuncEval: Measuring the effectiveness of long context models for function calling. arXiv preprint.

[28]. Nguyen, H., Tran, P., Li, X. (2024) Are More LLM Calls All You Need? Towards the Scaling Properties of Compound AI Systems. Advances in Neural Information Processing Systems, 37: 51173–51180.

[29]. Patel, A., Singh, R., Kumar, V. (2024) The Dark Side of Function Calling: Pathways to Jailbreaking Large Language Models. arXiv preprint.

[30]. He, Y., Ma, C., Zhang, L. (2025) Querying Databases with Function Calling. arXiv preprint.

[31]. Chen, M., Wang, L., Huang, S. (2025) Graph-Grounded LLMs: Leveraging Graphical Function Calling to Minimize LLM Hallucinations. arXiv preprint.

[32]. Ma, R., Zhou, Y., Xu, L. (2024) Large Language Models as Zero-shot Dialogue State Tracker through Function Calling. arXiv preprint.