Skip to content
gentic.news — AI News Intelligence Platform
Connecting to the Living Graph…
Training & Inference

Function Calling: definition + examples

Function Calling is a technique used in training large language models (LLMs) to produce structured outputs that correspond to predefined function signatures, allowing the model to invoke external tools, APIs, or databases as part of its response. This capability is typically instilled during supervised fine-tuning (SFT) or instruction tuning, where the model is trained on examples that pair user requests with the appropriate function name, parameters, and arguments in a structured format like JSON. For instance, given a user query "What's the weather in Tokyo?", the model might output {"function": "get_weather", "parameters": {"location": "Tokyo"}}.

Technically, the training process involves curating a dataset of (instruction, conversation history, function definitions, expected output) tuples. The model learns to parse function definitions (often provided as part of the system prompt) and select the correct function, fill in required parameters, and format the call correctly. This is distinct from pure generation because the output must adhere to a strict schema — any deviation (e.g., missing a required field, incorrect data type) constitutes an error. During inference, the model's output is parsed by a runtime that executes the function and returns the result, which can then be fed back into the model's context for further reasoning.

Function Calling matters because it bridges the gap between LLMs' natural language understanding and deterministic, reliable computation. Without it, models can only generate text, which is often insufficient for tasks like querying databases, performing calculations, or controlling software. It is a core enabler of agentic systems, where LLMs autonomously orchestrate multi-step workflows. As of 2026, Function Calling is a standard feature in most major LLM APIs (OpenAI, Anthropic, Google, Mistral, Llama) and is often combined with structured output constraints (like JSON mode or grammar-based sampling) to enforce correctness.

Common pitfalls include: (1) hallucinating function names or parameters not in the provided definitions, (2) failing to respect parameter types (e.g., passing a string where an integer is required), (3) generating incomplete calls when the model runs out of context, and (4) security risks from allowing arbitrary function execution without sandboxing. Mitigations include using constrained decoding (e.g., Outlines, LMQL, or JSON schema enforcement) and rigorous validation before executing calls.

The current state of the art (2026) involves training models specifically on function-calling benchmarks like Berkeley Function-Calling Leaderboard (BFCL) and using reinforcement learning from human feedback (RLHF) to reduce hallucination rates. Specialized models like Salesforce's xLAM, Nexusflow's NexusRaven, and fine-tuned versions of Llama 3.1 405B achieve >90% accuracy on complex multi-function scenarios. Research also explores tool-augmented training, where the model learns from its own execution traces via self-play or rejection sampling.

Function Calling is most appropriate when the task requires deterministic, verifiable actions (e.g., booking a flight, retrieving a record) and less so for purely creative or open-ended generation. It is an alternative to chain-of-thought prompting for tool use, but the two can be combined (e.g., CoT to plan, then function call to execute).

Examples

  • OpenAI GPT-4 Turbo (Nov 2023) introduced native function calling, allowing developers to define custom functions in JSON Schema and have the model output a structured call.
  • Llama 3.1 405B (July 2024) was fine-tuned on a synthetic function-calling dataset and achieved 88% accuracy on the BFCL v2 leaderboard.
  • Google Gemini 1.5 Pro (Feb 2025) supports parallel function calling, enabling multiple independent calls in a single turn (e.g., get weather for three cities simultaneously).
  • Salesforce xLAM (Jan 2026) uses a mixture-of-experts architecture trained on 10,000+ function definitions, achieving 95% accuracy on the ToolBench benchmark.
  • Mistral Large 2 (July 2024) includes built-in function calling with support for nested JSON schemas and optional parameters, used in production by thousands of companies via Le Chat.

Related terms

Tool Augmented TrainingStructured Output DecodingAgentic WorkflowsSupervised Fine-TuningReinforcement Learning from Human Feedback

Latest news mentioning Function Calling

FAQ

What is Function Calling?

Function Calling is a training paradigm where a language model learns to generate structured API calls or tool invocations as part of its output, enabling it to interact with external systems.

How does Function Calling work?

Function Calling is a technique used in training large language models (LLMs) to produce structured outputs that correspond to predefined function signatures, allowing the model to invoke external tools, APIs, or databases as part of its response. This capability is typically instilled during supervised fine-tuning (SFT) or instruction tuning, where the model is trained on examples that pair user requests…

Where is Function Calling used in 2026?

OpenAI GPT-4 Turbo (Nov 2023) introduced native function calling, allowing developers to define custom functions in JSON Schema and have the model output a structured call. Llama 3.1 405B (July 2024) was fine-tuned on a synthetic function-calling dataset and achieved 88% accuracy on the BFCL v2 leaderboard. Google Gemini 1.5 Pro (Feb 2025) supports parallel function calling, enabling multiple independent calls in a single turn (e.g., get weather for three cities simultaneously).