Multi-Agent LLMs: A Poetic Introduction

Large Language Models (LLMs) have caused a buzz in recent years due to their impressive performance across natural language processing tasks. However, if you’re anything like me and try to outsource all your life’s problems to ChatGPT, you have probably noticed some quirks. Common issues with LLMs include a lack of transparency in their decision-making, occasional inaccuracies in their output, and difficulty with more complex tasks.

A few weeks back, I came across a paper titled Encouraging Divergent Thinking in LLMs through Multi-Agent Debate, which sent me down a rabbit hole into a trending research topic: multi-agent LLMs.

What are Multi-Agent LLMs?

In essence, multi-agent LLMs consist of multiple LLM instances or agents working in a team, where each has a specific role and works with other agents to accomplish a shared goal. Research into multi-agent LLMs has shown how they can outperform single-agent systems across a variety of tasks—especially those that require more complex problem-solving.

Some of the key benefits of using multi-agent LLMs include:

Collaboration: Multi-agent systems shine when collaboration is key. They can leverage the strengths of multiple specialised agents, allowing them to tackle more complex tasks.
Explainability: Multi-agent collaboration can make the reasoning behind answers more transparent, as the decisions made by each agent are conveyed in natural language.
Accuracy: By having multiple agents communicate, inaccuracies in their output can be identified and resolved by each other, leading to more reliable results.

Building A Multi-Modal Example

To demonstrate their potential, we will create a small multi-agent system with a simple goal: to generate poems based on given images.

Why choose this task? Besides needing an LLM that can generate poems, we also need an LLM that can generate text from image input—also known as a multi-modal LLM. It’s a great fit for a multi-agent system since we need both creative writing and multi-modal skills.

It’s also worth noting that research into using multi-modal LLMs in multi-agent systems is still very limited, so it should make for an interesting experiment!

Agent Architecture

Before we can build anything, we first need to figure out the architecture of our application. This will include how many agents there are as well as their individual responsibilities.

To keep things simple, we will have two connected agents:

An Analyst Agent: Responsible for generating a detailed description of the initial image input. For this agent, we’ll be using Moondream 2, a 1.8 billion parameter multi-modal LLM.
A Creator Agent: Responsible for generating the poem from the analyst’s description. For this agent, we’ll be using the smallest Gemma 2 LLM, which is only 2 billion parameters.

We will also need to add prompts to each agent to ensure that they perform their respective tasks reliably…

“You are the Analyst Agent, an expert in visual analysis. Your task is to examine images in detail and provide a comprehensive description of what you see. Describe the scene as thoroughly as possible.”

And for the creator agent…

“You are the Creator Agent, a talented poet and creative writer. Your task is to take detailed descriptions of images provided by the Analyst Agent and craft a funny poem based on the imagery and emotions conveyed.”

Implementation

There are a handful of LLM frameworks available that would allow us to set up a multi-agent application, however, for this example, we went with LangGraph. LangGraph offers an intuitive way to set up multi-agent apps by representing agents as interconnected nodes in a graph.

Although we don’t delve into the implementation in this article, you can find a detailed Jupyter notebook with everything we did for this article here.

The Results

At first glance, the multi-agent system appears to be doing a great job!

paintMeLikeOneOfYourFrenchHedgehogs.png

Here we can see how even with extremely small LLMs, our multi-agent system can generate coherent, rhyming poems from images—something that neither agent could do without the help of the other.

Here’s another example…

iAintAfraidOfNoGhost.jpg

We can also see within these examples the extra layer of explainability. Instead of just a single output, we also have the analyst’s description in natural language, which provides further insight into how the final poem was generated without any additional prompting or interaction with the models.

Conclusion and Future Work

In this article, we have built a simple multi-agent LLM system that demonstrates how multiple agents can collaborate to accomplish tasks that none could manage individually. We have also demonstrated this using small LLMs that run without needing any specialized hardware, showing the potential of multi-agent LLMs in resource-limited situations.

There are many ways in which we could expand on this example in the future:

Adding additional agents, such as a ‘reviewer’ agent could provide feedback to the creator about the poems that it generates.
Adding function calling to one or more of the agents to allow them to use external tools or systems. An example of how we could do this is by configuring the creator agents to search the internet for poems similar to the ones they generate.
Configuring the agents to engage in multiple rounds of conversation to enhance the generated poem.

If anyone is interested in learning more about multi-agent LLMs, I would strongly recommend the paper Large Language Model based Multi-Agents: A Survey of Progress and Challenges, which covers the research space for multi-agent LLMs in detail.

Kane Miles