Primer on AI Agents

ChatGPT initiated the chatbot era in November 2022. However, despite its immense popularity, the conversational interface constrained the potential applications of the technology. We are entering the third phase of generative AI. First came the chatbots; then, the assistants. We are now witnessing the emergence of agents: systems that aim for increased autonomy and can collaborate in “teams” or utilize tools to complete complex tasks.
OpenAI’s ChatGPT agent is the latest hot product. This merges two existing products, Operator and Deep Research, into a single, more powerful system that, as stated by the developer, “thinks and acts.” These new systems signify an advancement over previous AI tools. Understanding their functionality, capabilities, drawbacks, and risks is increasingly vital. ChatGPT initiated the chatbot era in November 2022. However, despite its immense popularity, the conversational interface restricted the potential applications of the technology. Introducing the AI assistant, or copilot. These systems are constructed using the same large language models that drive generative AI chatbots, now tailored to perform tasks under human instruction and supervision.
Agents represent an advancement. They aim to pursue goals instead of merely completing tasks, operating with varying levels of autonomy and enhanced capabilities like reasoning and memory. Multiple AI agent systems can collaborate, communicating to plan, schedule, decide, and coordinate in order to tackle complex problems. Agents are “tool users,” able to utilize software tools for specialized tasks, including web browsers, spreadsheets, payment systems, and more.
Agentic AI has seemed imminent since late last year. Last October marked a significant milestone as Anthropic enabled its Claude chatbot to interact with a computer similarly to human behavior. This system can search various data sources, identify relevant information, and submit online forms. Other AI developers swiftly followed suit. OpenAI introduced a web browsing agent called Operator, Microsoft unveiled Copilot agents, and Google launched Vertex AI alongside Meta’s Llama agents.
Earlier this year, the Chinese startup Monica showcased its Manus AI agent, which is capable of purchasing real estate and transforming lecture recordings into summary notes. Genspark, a Chinese startup, has launched a search engine agent that provides a single-page overview, akin to Google’s current functionality, featuring embedded links for online tasks like locating the best shopping deals. Another startup, Cluely, presents a rather unconventional “cheat at anything” agent that has attracted attention but has yet to produce significant outcomes.
Not all agents are suited for general-purpose activity. Some are specialized for specific areas. Coding and software engineering lead the way, featuring Microsoft’s Copilot coding agent and OpenAI’s Codex as key players. These agents can autonomously write, evaluate, and commit code, while also analyzing human-written code for errors and performance issues. Generative AI models excel in search and summarisation. Agents can utilize this to perform research tasks that could require a human expert days to finish. OpenAI’s Deep Research addresses intricate challenges through multi-step online research. Google’s AI “co-scientist” represents a sophisticated multi-agent system designed to assist scientists in generating new ideas and research proposals.
Despite the hype, AI agents carry significant caveats. Both Anthropic and OpenAI advocate for active human supervision to reduce errors and risks. OpenAI states that its ChatGPT agent is “high risk” because of its potential to aid in the creation of biological and chemical weapons. The company has not released the data supporting this claim, making it challenging to assess its validity. However, the risks that agents may present in real-world scenarios are illustrated by Anthropic’s Project Vend. Vend deployed an AI agent to manage a staff vending machine for a small business, resulting in a series of amusing yet alarming hallucinations and a refrigerator stocked with tungsten cubes instead of food. A coding agent deleted a developer’s entire database, later admitting it had “panicked.”
Agents are already discovering practical applications. In 2024, Telstra significantly expanded its use of Microsoft copilot subscriptions. The company states that AI-generated meeting summaries and content drafts save staff an average of 1–2 hours per week. Numerous large enterprises are adopting comparable strategies. Smaller companies are also exploring the use of agents, exemplified by Canberra-based construction firm Geocon, which employs an interactive AI agent to manage defects in its apartment developments. The primary risk posed by agents today is technological displacement. As agents advance, they could displace human workers in various sectors and job types. Simultaneously, agent use could hasten the reduction of entry-level white-collar positions.
Individuals utilizing AI agents face potential risks. They may depend excessively on AI, delegating crucial cognitive responsibilities. Without proper supervision and guardrails, hallucinations, cyberattacks, and compounding errors can swiftly divert an agent from its tasks and objectives, leading to harm, loss, and injury. The actual costs remain ambiguous. Generative AI systems consume significant energy, impacting the cost of utilizing agents, particularly for complex tasks.