“The chatbot was the demo. The agent is the product. When AI stops answering questions and starts completing tasks, the relationship between human work and machine capability changes in ways that make everything that came before look like a prologue.”
Neal Lloyd · Inside The Machine, Day 18In November 2022, OpenAI released ChatGPT. The world was astonished. Here was an AI that could answer questions, write essays, explain concepts, generate code, and hold a conversation in natural language at a quality that surpassed anything the public had seen. The astonishment was real and the capability was genuine. But there was a significant constraint that received less attention than it deserved at the time: ChatGPT was a conversational tool. You asked it something. It answered. The next thing you asked it, it answered again, without memory of what came before unless you stayed in the same conversation window. It could not do anything in the world. It could not browse a website, send an email, execute code, or take any action outside the conversation. It was extraordinarily capable as a language system and fundamentally passive as an agent in the world. That constraint is what AI agents are designed to overcome. And the overcoming of it is the most significant architectural shift in AI since the transformer model was introduced in 2017.
Not a Chatbot With Extra Steps. Something Architecturally Different.
An AI agent is a system that uses a language model as its reasoning core and connects that reasoning core to tools — capabilities that allow it to take actions in the world. The tools might include: web browsing (the agent can search for information and read web pages), code execution (the agent can write and run programs), file management (the agent can read, write, and organise files), API calls (the agent can interact with external services), email and calendar access (the agent can send messages and schedule meetings), and computer use (the agent can operate a computer interface as a human would — clicking, typing, navigating). The language model decides which tool to use at each step, uses it, observes the result, and then decides what to do next — iterating until the task is complete or until it determines it cannot complete the task.
This loop — reason, act, observe, repeat — is the core architecture of an AI agent. It is sometimes called the ReAct pattern (Reasoning and Acting). What makes it qualitatively different from a chatbot is that the agent can pursue a goal over multiple steps, using different tools in sequence, without requiring a human to direct each step. The human sets the goal; the agent works out how to accomplish it. This is not a small difference in degree. It is a difference in kind. A chatbot is a sophisticated question-answering system. An agent is a goal-directed autonomous system that operates in the world.
Every AI agent has four components. The Model: the language model that does the reasoning — Claude, GPT-5.5, Gemini, or any capable LLM. The Tools: the capabilities the model can invoke — web search, code execution, file access, APIs, computer use. The Memory: what the agent can remember across steps — short-term (the current task context), long-term (a persistent store of prior knowledge), and external (documents and databases it can retrieve from). The Orchestration: the logic that manages the reason-act-observe loop, handles errors, decides when to ask for human input, and determines when the task is complete. The quality of the agent depends on all four. A frontier model with poor tools is less useful than a good model with excellent tools. Memory and orchestration are where most production agent failures currently occur.
The Capabilities That Are Real Today, Separated from the Hype
Software development. This is where agents have progressed furthest and most reliably. Claude Code, OpenAI Codex, GitHub Copilot, and Google’s Gemini Code are all agentic coding systems capable of reading a codebase, understanding a bug report or feature request, writing the necessary code changes, running tests, interpreting test failures, and iterating until the tests pass — without requiring a human to direct each step. The productivity gains in software development from agentic tools are now well-documented and significant. Senior engineers using agentic coding assistants complete complex tasks 30-55% faster in controlled studies. The entry-level coding job impact — as discussed in Episode 05 — is measurable and growing.
Research and information synthesis. Agents that can search the web, read documents, extract information, and synthesise findings across multiple sources are now in production use across consulting, legal, financial services, and journalism. The quality varies significantly by domain and by how well the agent’s retrieval and synthesis capabilities are calibrated for the specific task. In well-defined domains with good retrieval sources, agentic research tools can complete in minutes literature reviews that would take human researchers hours. In poorly defined domains with ambiguous sources, they can produce confidently synthesised misinformation at the same speed. The hallucination risk discussed in Day 16 is compounded in agentic systems because errors at one step propagate through subsequent steps.
Business process automation. The category June 2026 is watching most closely is agentic business process automation — agents that operate enterprise software, manage workflows, respond to customer queries, process documents, and handle the routine administrative work that currently employs large numbers of knowledge workers. Salesforce’s Agentforce, Microsoft’s Copilot Studio, and ServiceNow’s AI agent platform are all in production deployments. The early results are positive in narrow, well-defined processes and considerably more mixed in processes that require judgment, exception handling, or stakeholder negotiation — precisely the tasks that are hardest to define and most common in real business environments.
Computer use. The most radical agentic capability — and the one furthest from reliable production deployment — is computer use: the ability of an agent to operate a computer interface as a human would, clicking buttons, filling forms, navigating applications, without requiring API access or programmatic integration. Anthropic’s Computer Use feature, OpenAI’s Operator, and Google’s Project Mariner all demonstrate this capability in controlled settings. In production, computer use agents are still error-prone, slow, and difficult to supervise reliably. The potential is enormous — any task a human can do on a computer, an agent could theoretically do — and the current production reliability is not yet there at scale.
The transition from chatbot to agent is the transition from AI that knows things to AI that does things. The first transition changed how we access information. The second changes how work gets done. The implications of the second are larger, less understood, and arriving faster than the institutions that need to respond to them are prepared for.Neal Lloyd · Inside The Machine, Day 18
What Goes Wrong When an Agent Goes Wrong — and Why It Is Different From Chatbot Failure
The failure modes of AI agents are categorically different from the failure modes of chatbots, and more consequential. When a chatbot halluccinates, it produces a wrong answer. You can check it, reject it, and ask again. When an agent makes a mistake in the third step of a twelve-step process, it may take five more steps before the error becomes apparent — and by then, the agent may have sent an email, modified a file, executed code, or made an API call that is difficult or impossible to reverse. Agent failures are not just wrong answers. They are wrong actions, with real-world consequences, taken autonomously, at the speed of computation.
Task drift. Agents given broad goals and significant autonomy have a documented tendency to pursue the stated goal in ways that were not intended — taking actions that satisfy the literal task specification while violating the implicit context. An agent asked to “clean up my inbox” that archives emails its model determines are low-priority has satisfied the literal instruction while potentially discarding something important. The gap between the task as stated and the task as intended is where agent failures most commonly occur.
Error propagation. Because agents chain actions together, an error at step three propagates through steps four, five, and six. The agent does not stop and ask whether its understanding at step three was correct; it proceeds with the assumption that step three was successful and builds on it. By the time a human reviews the output, the initial error may be buried under layers of subsequent action that were all internally consistent with the wrong premise.
Prompt injection in agentic contexts. As discussed in Episode 02 in the context of ChatGPT memory, malicious content in external sources — a web page an agent is asked to read, a document it is asked to analyse — can contain instructions that the agent follows as if they were part of its task. In an agentic context, where the agent has tools to take real actions, a successful prompt injection is not merely a memory poisoning event. It is a potential instruction to send an email, execute code, or make an API call on behalf of the user. The attack surface of an agent is the entire set of external content it can read, which in the most capable agents is essentially the entire internet.
The central governance challenge of AI agents is determining when and how human oversight is required. Too much oversight — requiring human approval for every action — eliminates the productivity benefit. Too little oversight — allowing agents to operate fully autonomously — exposes organisations to agent failures that are irreversible by the time a human reviews them. The appropriate level of oversight depends on the stakes of the task, the reliability of the agent in the specific domain, and the reversibility of the actions being taken. These are judgment calls that most organisations deploying agents in 2026 are making without adequate frameworks, without track records, and without regulatory guidance. The oversight question for AI agents is the governance challenge of the next five years.
The Long-Term Implications of AI That Acts Rather Than Answers
The economic model of AI changes. Chatbot AI is priced per token — per word generated. Agentic AI will increasingly be priced per task completed or per outcome achieved. This is not a billing method change. It is a value capture change. When AI is priced per outcome, the companies providing it are capturing a share of the productivity gain, not just the cost of the compute. The economic model of AI agents is closer to outsourcing than to software licensing — and the implications for how AI companies are valued, and for the employment economics of the roles agents are performing, are significant.
The autonomy spectrum will be contested. The question of how much autonomy AI agents should have — in what domains, for what tasks, with what oversight requirements — will be one of the defining regulatory and ethical debates of the next decade. It is already beginning. The EU AI Act’s high-risk classification covers several agentic use cases. The debate about “human in the loop” requirements, which we covered in the context of accountability in Day 12, becomes considerably more urgent and more technically complex when the AI in question is not making a recommendation but taking an action.
The skill premium shifts again. The skills that protect workers in a chatbot world — knowing how to prompt effectively, how to evaluate AI outputs, how to integrate AI into workflows — are necessary but insufficient in an agentic world. The skills that become most valuable are those that AI agents are worst at: defining goals precisely enough for an agent to pursue them reliably, designing workflows that include appropriate human checkpoints, auditing agent outputs for the specific failure modes that matter in a given domain, and taking accountability for outcomes that agents produce. These are not technical skills. They are judgment, design, and oversight skills. The human role in an agentic world is not eliminated. It is elevated — and it requires capabilities that most organisations have not yet begun to develop systematically.
The question is not whether AI agents will do tasks that humans currently do. They will. The question is which humans get to define the tasks, oversee the agents, and take accountability for the outcomes — and what skills those humans need that today’s education and training systems are not yet developing. The agent era is not coming. It is here. The preparation for it is not.Neal Lloyd · Inside The Machine, Day 18
Inside The Machine, Day 18 · June 2026
Neal Lloyd writes about technology, human adaptation, and the uncomfortable questions nobody wants to answer at dinner. Inside The Machine is his ongoing daily series on AI.
- Day 01What Is This Thing?
- Day 02Survive the Machine
- Day 03The Great Debate
- Day 04Who Gets Hurt?
- Day 05Who’s In Charge?
- Day 06The Industries That Win
- Day 07The Human Edge
- Day 08The Creativity Question
- Day 09Does AI Feel Anything?
- Day 10The Data Problem
- Day 11The Trust Question
- Day 12The Accountability Gap
- Day 13The Rewired Brain
- Day 14Open vs Closed
- Day 15The New Cold War
- Day 16Why AI Lies With Confidence
- Day 17AI Is Eating the Power Grid
- Day 18The Age of AI AgentsYou are here



