Generated by DALL-E

Three AI Design Patterns of Autonomous Agents

Alexander Sniffin
8 min readMar 14, 2024

--

One of the hottest topics in AI over the past year has been the use of Large Language Models (LLMs) as reasoning engines for autonomous agents.

The enhanced reasoning capabilities of models such as GPT-4, Claude 3, and Gemini Ultra have set the stage for the creation of increasingly sophisticated and powerful agents.

I’ll go over the core concepts to building agents as well as three different design patterns:

  • Finite State Machine Agent
  • Task-Planner Agent
  • Orchestration Agent

Each pattern is agnostic of language and framework and will address different concepts as well as design complexities that arise between each.

Core Concepts

Before jumping into each pattern, let’s first understand the common elements utilized across these designs.

ReAct & CoT Prompting

The most important component to building agents is through the use of both ReAct (reasoning-acting) and CoT (chain-of-thought) prompting techniques. Both techniques use intermediate reasoning steps. This allows the model to work through complex problems one step at a time.

Source promptingguide.ai

By introducing intermediate steps, the model can better answer questions than other techniques like zero-shot prompting.

Source promptingguide.ai

In the above example, information is passed to the model through in-context learning to help solve the problem. If we attempted to solve this without factual context, the model would likely hallucinate an answer.

Tools

Agents need a way to interact with their environment. This can be done through the use of tools, similar to how we (as humans) use tools to help solve our own problems, agents can also use tools that we define.

For an agent, a tool is part of our program that the model learns to use through in-context learning. This takes a schema, using a structured input and output that the model is able to reason potential parameters for. By using the model like a natural language reasoning engine, we attempt to generate actions from the semantics of the request.

Structured Input & Output

When prompting models we typically use natural language but when building agents we need a way to “marshal” intent to and from our application memory. One common way to do this is using structured language that the model has been trained on. This is typically JSON, although other structured languages can also be compatible with many models.

Memory

The agent needs a way to track what has been done between each prompt. We can think of this like a “scratchpad” where the agent will take notes on what its done so far. The implementation of this can vary, for the examples below, I only track the history of each generated output.

Finite State Machine Agent

The Finite State Machine (FSM) agent specializes in creating predictable states for defining the agent’s behavior. We can define our ReAct prompts as states. Each ReAct state can represent various properties, including:

  • a prompt for the model and,
  • a handler for mapping application logic to and from the model
Simple example of an FSM Agent

The benefits of the FSM agent are:

  • predictability
  • tasks are isolated from other states
  • easy to troubleshoot
  • easy to add new states

Potential problems include:

  • prone to getting stuck in loops
  • can get side tracked or lose focus from the original request

A simple FSM agent can be written in Python, for this example lets assume the LLM backend is implemented elsewhere and the agent has tools and memory.

Notebook example of an FSM agent

The example uses a StockTicker price tool to get factual information on ticker prices. The agent can reason when the question is asking about a particular ticker and use the tool. It’s possible to add in additional tools to ask more complex questions. Each handler contains a prompt and the correct structured input and output parsing using Pydantic to create JSON schemas.

This implementation isn’t complete but addresses a simple way to build this agent. Some additional improvements would be to add context-window awareness to the memory. If the tool output or memory grows too large, you would need to compress it. One possible way to do this is by tokenizing everything first then determining the length to see whether the memory is too large. If it is, then you can do summarization, truncation or another technique to reduce the total tokens in the prompts. Another improvement is adding control-flows, for context cancellation or escaping from loops.

Task-Planner Agent

The task planner is an agent which defines a concrete plan on what needs to be done and attempts to work through that plan using CoT prompting.

The plan will consist of tasks where each task is an isolated piece of work. In the below example, the task planner defines a queue of tasks and will work through them one at a time.

Simple example of a Task-Planner Agent

We can test the planning prompt for generating tasks using the OpenAI playground with GPT 4 Turbo.

OpenAI Playground

The model is able to create isolated tasks on what might need to be done. A neat aspect of this agent is that it could extend on the FSM agent where the planning step occurs as a new state and the action state would instead pop from the stack of tasks and observe the output from the tools.

The benefit of this pattern is that work is planned upfront rather than continuously like the state machine. This defines intermediate steps early that can help reduce the chance of getting stuck in a loop, but it’s not guaranteed.

This also comes with some problems where the agent might initially make a mistake in the plan which might cause errors throughout the tasks and require backtracking and generating new tasks. There’s a chance tasks are impossible to solve like invalid tool usage which requires a new plan and starting over.

Prompting the model to do task planning is a similar idea to the popular project AutoGPT which uses a task queue to breakdown the problem into small pieces of work.

Orchestration Agent

The orchestration agent uses a delegation-like pattern. Rather than having a single agent that does everything, we can define agents that specialize in solving specific problems that also have different implementations.

It’s also possible that we would use the previous agents in combination with an orchestrator. The orchestrator can both supervise and route between agents to get the best desired output based on the question.

OpenAI introduced a similar feature to this in ChatGPT by allowing users to message multiple GPTs in a conversation. A GPT is an autonomous agentic feature in ChatGPT that has reasoning capability and tool usage. This allows users to potentially use GPTs which have different prompts and tools to answer complex questions.

I created a simple conversation to demonstrate this idea. First I used a GPT for writing, then the official DALL-E GPT and last a custom GPT I wrote. Each GPT has a different system prompt and tools but can be used to solve different problems together.

Multi-GPT example

This is an interesting way to dynamically combine different implementations of autonomous agents even though they have the same underlying models. Even though I selected which GPT to use and it doesn’t have orchestration reasoning, it would be simple to expand on this by adding a reasoning step between each message and testing it in the playground.

Distributed Agents?

The separation of agent responsibility also allows for the ability to create a more complex architecture for a distributed agent network. Remote communication of agents would give a few interesting benefits:

  • ability for agents to run on different hardware and networks
  • be written in different languages or frameworks, agnostic to underlying implementation details
  • scale for different workloads
  • broadcasting work across a pool of agents for fanning out work
  • discoverability for finding agents to perform tasks

There could be many different approaches to implement this, but one method is with a distributed Actor model. The Actor model is typically a message passing concurrency methodology but some frameworks support remote communication.

An actor can: make local decisions, create more actors, send messages, and determine how to respond to the next message received. In this case, an agent would be an actor. Some actor model frameworks support both local and remote actors, known as location transparency. The underlying networking for messaging passing is abstracted away. I experimented with an early project doing exactly this in 2023, for testing an autonomous system alexsniffin/go-auto-gpt.

Nevertheless, the added complexity of such a system isn’t always justified. The nondeterministic nature of how agents operate makes creating this type of system particularly challenging, given the numerous potential points of failure. Designing a simpler local orchestration agent that delegates work to a limited number of agents would be much easier to troubleshoot and maintain. This approach would also simplify integration with other systems, such as conversational RAG (Retrieval-Augmented Generation) pipelines.

Related Research

A lot of what has been reviewed so far overlaps with the established area of research with automated planning and scheduling. It’s a branch of AI that focuses on the creation of plans or strategies to achieve specific goals or complete tasks. It optimizes autonomous decision-making and resource allocation, adapting dynamically to ensure efficient goal achievement.

Conclusion

Autonomous agents is a very exciting field in the AI space.

It’ll continue to be very challenging to build these systems but as LLMs continue to improve, so will the possibilities to build more useful autonomous systems using natural language.

Thanks for reading!

--

--

Alexander Sniffin

Software Engineer solving the next big problem one coffee at a time @ alexsniffin.com