On 25 September in the event jointly organised by the Frankfurt Data Science and Frankfurt AI Meetup in Techquartier, Frankfurt am Main, our co-founder Gyula and one of our data scientists Gergő talked about how to build autonomous intelligent software agents on the basis of Large Lanugage Models (LLMs) such as ChatGPT. The title inspired by the movie Matrix refers to the fact this is a new research area with a lot of experimentation and further other work needed before LLM-based agents can be deployed to facilitate enterprise processes.
In our presentation first, we discussed how to define intelligence. Not a straightforward question if we have to answer it. Can we say that it is the ability to predict “what comes next”? Is it the capability to understand the world around us, recognize the patterns and rules, and based on this knowledge have predictions for the future? Our colleague Gergő being a physicist likes this definition, because in physics we do exactly the same. But let’s have a look at how we measure intelligence with tests, i.e. with what kind of tasks we can use. See the following example taken from an IQ test.
What are these tasks? Yes, the task is to be able to predict what comes next.
As Jacob Hohwy, author of the book The Predictive Mind said: “Predictive coding is a cognitive and neural framework that suggests the brain continuously generates and updates predictions about sensory input to minimize the discrepancy between these predictions and actual sensory information.”
We are not saying that this is the correct definition of intelligence, but currently we define that in this way.
The mechanisms of LLMs are exactly the same: to put it simple, they don’t do anything else, just predict what (word) comes next. We teach them on the “entire” internet so that they could understand perfectly the semantics of human speech, they could have powerful skills on text processing tasks like extracting information from text, semantically classifying words, summarizing long texts or generating texts. How can we use these new, incredibly strong and powerful skills for our work tasks?
The easiest solution is called Retrieval Augmented Generation, when we take back relevant pieces of information from our own knowledge/database and we feed an LLM with these pieces of information. In this way we can use the semantic understanding skill of the LLM on our verified sources (instead of information coming from blogs, reddit and tweets and all sources on what they teach the LLM). In practical view, we can build our chat-bot with 24/7 availability, nearly zero cost and with always up-to-date information. If you are interested, you can try out our own solution integrated on our messenger account: talk with Zoli, our helpful LLM driven chatbot on our facebook page or on our website.
Diving more inside the skills of the LLMs, we found that they are also able to decompose a high level task to subtasks (“Let’s think step-by-step”, Chain-of-thought prompting). As a reinforcement to this idea, an exciting scientific paper came out, which changed this game entirely (ReAct). They authors were the first to investigate the behaviour of an LLM with capability to ACT and REASONING together, which overperformed everything before. From high level point of view, what we did was outsourcing the decision to an LLM Agent which can autonomously decide how to solve the problem, reasoning what to act and also executing it. Quoting a famous Hungarian writer: “The great work is finished, yes, the machine is running, the creator rests.”.
From technical point of view, we added a toolset to our LLM Agent from which it can choose which action to take (and also execute this action). We added memory to remember what happened before, what action it took before, what were the previous interactions. We added a skill of reflection, self-critics to improve the model itself. What is game-changing in this case, is that from here, the LLM becomes autonomous.
So, have we found agent Smith? In spite of the promising above experimental work, so far, not yet. A lot of additional thinking and experimental projects will be needed for production-ready LLM-agent-based enterprise solutions. Also, as in other cases of IT solutions, we will have to make compromises between cost, execution time, and efficiency.
But this is where we stand today, in September 2023, only a couple of months after that ChatGPT went public. We believe at NeuronSolutions that this situation will develop soon, and will change many things for functions and processes at enterprises or other organisations. Then the questions will again arise whether autonomous intelligent agents will replace entire jobs? Probably not, but change the type of our work significantly. On the one hand, we will need to adapt ourselves to this, on the other hand our work will definitely be quicker and easier by not having to focus on many boring tasks which agents can do on behalf of us.