📚 Function Calling Agent using OpenAI Assistant

🕛 Zeit seit Veröffentlichung: 23 Tage, 22 Stunden 36 Minuten
📆 Veröffentlicht am: 23.05.2024 um 15:34 Uhr
💡 Newskategorie: Programmierung
🔗 Quelle: dev.to

What is “function calling” in LLMs?

“Function calling” in LLMs refer to its capability to accept a list of user defined functions (aka tools) and to intelligently choose which “tool” to use based on the prompt provided by the user.

For example, let’s say we have a function get_weather(location: str) which provides the current weather based on the location argument. If we pass the get_weather function to the LLM as a tool and ask “What’s the current weather in Budapest?”, the LLM can decide to use the get_weather tool instead of responding with incorrect data. In this case, the LLM will simply return the tool name, which is get_weather and the arguments to be passed which is location = "Budapest". Now, we can call the get_weather function using the arguments provided by the LLM (note that LLM cannot call the function for us, we should do that ourselves). After executing the function, pass the return value of the function back to the LLM, and the LLM will provide us with a sensible response.

So what is a function calling agent?

The general idea of a function calling agent is that, we pass the user query and a list of tools to the LLM, and then call the LLM in a loop until we get the desired response. I know its a vague description :) So, let’s consider an example for better understanding.

Let’s consider a mathematical query for example, since LLMs are generally bad at math.

Tools provided to the LLM

add(a: float, b: float)
subtract(a: float, b: float)
multiply(a: float, b: float)

Query:

calculate sum of 1 and 5 and multiply it with the difference of 6 and 3

The Agent Loop

The below image will do a better job in explaining the agent loop than writing a long explanation.

You can see that a “conversation” is going on between the app and the LLM inside the agent loop until the LLM can find the result for the user’s query.

Implementation using OpenAI assistant

Below is the complete code for the OpenAI function calling agent. The explanation for each step has been provided as comments in the code.

Before executing this script, install the required libraries using the below command.

pip install openai python-dotenv

Also, store your OpenAI API key in a .env file (OPENAI_API_KEY=your-api-key).

import os
import sys
from dotenv import load_dotenv
import openai
from openai.types.beta import Assistant, Thread
from openai.types.beta.threads import Run

import json
import logging

# Load env variables (OPENAI_API_KEY)
load_dotenv()

# set log handlers
log_handler = logging.StreamHandler(sys.stdout)
log = logging.getLogger(__name__)
log.addHandler(log_handler)
log.setLevel(logging.INFO)

# initialize OpenAI client
client = openai.OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

# Define the functions which will be the part of the LLM toolkit
def add(a: float, b: float) -> float:
    return a + b

def subtract(a: float, b: float) -> float:
    return a - b

def multiply(a: float, b: float) -> float:
    return a * b

tool_callables = {
    "add": add,
    "subtract": subtract,
    "multiply": multiply
}

# declaration of tools (functions) to be passed into the OpenAI assistant
math_tools = [
    {
        "function": {
            "name": "add",
            "description": "Returns the sum of two numbers.",
            "parameters": {
                "type": "object",
                "properties": {"a": {"type": "number"}, "b": {"type": "number"}},
                "required": ["a", "b"],
            },
        },
        "type": "function",
    },
    {
        "function": {
            "name": "subtract",
            "description": "Returns the difference of two numbers.",
            "parameters": {
                "type": "object",
                "properties": {"a": {"type": "number"}, "b": {"type": "number"}},
                "required": ["a", "b"],
            },
        },
        "type": "function",
    },
    {
        "function": {
            "name": "multiply",
            "description": "Returns the product of two numbers.",
            "parameters": {
                "type": "object",
                "properties": {"a": {"type": "number"}, "b": {"type": "number"}},
                "required": ["a", "b"],
            },
        },
        "type": "function",
    }
]

openai_assistant: Assistant = client.beta.assistants.create(
    model="gpt-4-turbo",
    instructions="you are a math tutor, who explains the solutions to math problems",
    name="math-tutor",
    tools=math_tools
)


def run_math_agent(query: str, max_turns: int = 5) -> str:
    # Initialize the OpenAI assistant. The assistant will have its own unique id.
    openai_assistant = client.beta.assistants.create(
        model="gpt-4-turbo",
        instructions="you are a math tutor, who explains the solutions to math problems",
        name="math-tutor",
        tools=math_tools
    )

    # Create a Thread. In OpenAI lingo, a `Thread` can be considered like a conversation thread (not the multithreading one)
    # The to and fro communication between the script and the LLM will be stored against this thread id.
    thread: Thread = client.beta.threads.create()

    # Send the user query as part of the newly created thread
    client.beta.threads.messages.create(
        thread_id=thread.id, role="user", content=query
    )

    # The user query is now part of the thread. Now call the LLM (or "run" the thread in OpenAI lingo).
    # `create_and_poll` is just a helper method which polls the LLM until a terminal state is reached.
    # 
    # The terminal states are given below:
    # "requires_action" - A function call is required. Execute the function and submit the response back.
    #                     The results should be submitted back before the `expires_at` timestamp.
    # "completed"       - The Run is completed successfully.
    # "cancelled"       - The run was cancelled (its possible to cancel an in-progress Run).
    # "failed"          - Failed due to some error.
    # "expired"         - Run can get expired if we fail to submit function call results before `expires_at` timestamp.
    run: Run = client.beta.threads.runs.create_and_poll(
        thread_id=thread.id,
        assistant_id=openai_assistant.id,
    )

    # The agent loop. `max_turns` will set a limit on the number of LLM calls made inside the agent loop.
    # Its better to set a limit since LLM calls are costly.
    for turn in range(max_turns):

        # Fetch the last message from the thread
        messages = client.beta.threads.messages.list(
            thread_id=thread.id,
            run_id=run.id,
            order="desc",
            limit=1,
        )

        # Check for the terminal state of the Run.
        # If state is "completed", exit agent loop and return the LLM response.
        if run.status == "completed":
            assistant_res: str = next(
                (
                    content.text.value
                    for content in messages.data[0].content
                    if content.type == "text"
                ),
                None,
            )

            return assistant_res

        # If state is "requires_action", function calls are required. Execute the functions and send their outputs to the LLM.
        if run.status == "requires_action":
            func_tool_outputs = []

            # LLM can ask for multiple functions to be executed. Execute all function calls in loop and
            # append the results into `func_tool_outputs` list.
            for tool in run.required_action.submit_tool_outputs.tool_calls:
                # parse the arguments required for the function call from the LLM response
                args = (
                    json.loads(tool.function.arguments)
                    if tool.function.arguments
                    else {}
                )
                func_output = tool_callables[tool.function.name](**args)

                # OpenAI needs the output of the function call against the tool_call_id
                func_tool_outputs.append(
                    {"tool_call_id": tool.id, "output": str(func_output)}
                )

            # Submit the function call outputs back to OpenAI
            run = client.beta.threads.runs.submit_tool_outputs_and_poll(
                thread_id=thread.id, run_id=run.id, tool_outputs=func_tool_outputs
            )

            # Continue the agent loop.
            # Agent will check the output of the function output submission as part of next iteration.
            continue

        # Handle errors if terminal state is "failed"
        else:
            if run.status == "failed":
                log.error(
                    f"OpenAIFunctionAgent turn-{turn+1} | Run failure reason: {run.last_error}"
                )

            raise Exception(
                f"Failed to generate text due to: {run.last_error}"
            )

    # Raise error if turn-limit is reached.
    raise MaxTurnsReachedException()


class MaxTurnsReachedException(Exception):
    def __init__(self):
        super().__init__("Reached maximum number of turns")

if __name__ == "__main__":
    log.info(run_math_agent("calculate sum of 1 and 5 and multiply it with difference of 6 and 3"))

On executing the above script the LLM will respond with the correct result, which is 18. The LLM will also explain (like a math tutor) each step it took to find the result.

Shameless plug :)

You can use function calling agents for your application in a much easier way by just adding a python library called LLMSmith to your list of dependencies. As you might have guessed, I’m the author of that library :).

You can install the library using the following command.

pip install "llmsmith[openai]"

Here’s the code for using a function calling agent using llmsmith.

import asyncio
import os

from dotenv import load_dotenv
import openai
from llmsmith.agent.function.openai import OpenAIFunctionAgent
from llmsmith.agent.function.options.openai import OpenAIAssistantOptions
from llmsmith.agent.tool.openai import OpenAIAssistantTool

from llmsmith.task.models import TaskInput


# load env vars for getting OPENAI_API_KEY
load_dotenv()

# Define the functions which will be the part of the LLM toolkit
def add(a: float, b: float) -> float:
    return a + b


async def run():
    # initialize OpenAI client
    llm = openai.AsyncOpenAI(api_key=os.getenv("OPENAI_API_KEY"))

    # declaration of tools (functions) to be passed into the OpenAIFunctionAgent
    add_tool = OpenAIAssistantTool(
        declaration={
            "function": {
                "name": "add",
                "description": "Returns the sum of two numbers.",
                "parameters": {
                    "type": "object",
                    "properties": {"a": {"type": "number"}, "b": {"type": "number"}},
                    "required": ["a", "b"],
                },
            },
            "type": "function",
        },
        callable=add,
    )

    # create the agent
    task: OpenAIFunctionAgent = await OpenAIFunctionAgent.create(
        name="testfunc",
        llm=llm,
        assistant_options=OpenAIAssistantOptions(
            model="gpt-4-turbo",
            system_prompt="you are a math tutor, who explains the solutions to math problems"
        ),
        tools=[add_tool],
        max_turns=5,
    )

    # run the agent
    res = await task.execute(TaskInput("Add sum of 1 and 2 to the sum of 5 and 6"))

    print(f"\n\nAgent response: {res.content}")


if __name__ == "__main__":
    loop = asyncio.get_event_loop()
    loop.run_until_complete(run())

llmsmith can be used for building all sorts of LLM based functionalities (not just agents). Refer the documentation for getting a better idea about the library. The Examples section has sample codes for implementing RAG using llmsmith and more.

...