Skip to content

What Are AI Agents? Why Do They Matter?

Author: Addy Osmani

Original: Read the full article

Summary

Understanding the evolution, architecture, and future of AI agents

An agent is a program that autonomously completes tasks or makes decisions based on data. It converses with AI models and uses tools and resources to perform goal-based actions.

Agents vs. Models: A Subtle but Profound Distinction

The difference between traditional AI models and agents is subtle but profound. When we interact with an AI model (e.g., Gemini, o1, Sonnet, or similar LLMs), we're essentially conducting a series of one-shot interactions: we provide input, the model processes it, and returns output. While these interactions can be sophisticated, they are fundamentally passive and stateless. Each response exists in isolation, with no real continuity or ability to take independent action.

By contrast, AI agents are autonomous systems designed to perceive their environment, make decisions, and take actions to achieve specific goals—while maintaining context and adapting their approach based on outcomes.

This may sound like a minor distinction, but it represents a fundamental shift in how AI systems operate and what they can achieve.

Consider how we use AI chat interfaces today. You might ask ChatGPT to write an article from scratch and receive a single response. You may need to do some work yourself to iterate on it. The agent version is far more nuanced—an agent might write an outline, decide whether research is needed, draft the article, evaluate whether it needs improvement, and self-revise. This iterative process mimics how an engineer approaches a problem.

Multi-Agent Workflows

This becomes even more powerful in agent-to-agent workflows. Consider a data analysis project where multiple agents collaborate: a data preparation agent might clean and normalize your raw data, passing it to an analysis agent that applies statistical methods and identifies patterns. This agent might then collaborate with a visualization agent to create compelling representations of findings, while a documentation agent records methodology and results. Finally, a review agent might validate the entire workflow and suggest improvements or additional analyses.

Each agent in this workflow maintains its own context and expertise while coordinating with other agents through structured protocols. They can handle edge cases dynamically—for example, if the analysis agent discovers data quality issues, it can request specific cleaning from the preparation agent, or if the visualization agent identifies interesting patterns, it can suggest additional analysis to explore further.

It's no longer just the difference between having an advisor who can offer suggestions and having a colleague who can help get the work done—it's like having an entire team of experts collaborating seamlessly on your behalf.

Interoperability Standards: MCP and A2A

The ability of AI agents to operate autonomously and collaboratively requires the development and adoption of standardized communication protocols. Two key developments are worth remembering:

  • Model Context Protocol (MCP): Introduced by Anthropic, MCP is an open standard designed to connect AI models to external tools and data sources (it's "the USB-C for AI integrations"). It enables AI assistants to directly access and interact with various datasets, enhancing their information retrieval and task execution capabilities. For example, MCP allows an AI assistant to directly connect to platforms like GitHub to create repositories and efficiently manage pull requests.

  • Agent2Agent Protocol (A2A): Recently announced by Google, A2A is an open standard designed to facilitate seamless communication and collaboration between AI agents from different vendors and frameworks. It allows agents to securely exchange information and coordinate actions across various enterprise platforms.

In simple terms: MCP connects agents to tools (think agent-to-API), while A2A enables agents to talk to other agents.

Beyond Simple Automation: Understanding Agent Capabilities

AI agents' capabilities go far beyond simple automation scripts or chatbots. They can:

  1. Maintain persistent context and memory across multiple interactions, enabling them to learn from experience and adapt their approach over time.

  2. Use tools and interact with external systems, whether through APIs, web browsers, or direct system interactions.

  3. Decompose complex goals into manageable subtasks and execute them in logical sequence.

  4. Monitor their own progress and adjust strategies based on outcomes.

  5. Collaborate with other agents, each specialized in different aspects of a larger task.

This combination of capabilities enables agents to handle complex, multi-step tasks that would be difficult or impossible for traditional AI systems.

Architecture Patterns: Building Blocks of Autonomous Systems

The emergence of AI agents has given rise to several distinct architectural patterns, each addressing different aspects of the challenges of autonomous decision-making and action.

Tool Use and Integration

The most fundamental pattern in agent architecture is tool use—the ability to interact with external systems and APIs to accomplish tasks. Modern agents can interact with everything from database systems to development environments, but what's particularly interesting is how they choose which tools to use and when.

For example, when an agent is asked to analyze sales data, it might need to:

  • Use file system APIs to access raw data
  • Use data processing libraries for analysis
  • Leverage visualization tools to create reports
  • Utilize communication APIs to share results

The sophistication lies not in individual tool interactions but in the agent's ability to coherently orchestrate these tools toward a goal.

Memory and Context Management

Perhaps the most critical architectural challenge in agent systems is managing memory and context. Unlike stateless LLM interactions, agents need to maintain understanding of their environment and prior actions over time. This has led to several innovative approaches:

  1. Episodic Memory: Agents maintain records of past interactions and their outcomes, enabling them to learn from experience and avoid repeating mistakes. This isn't just about storing conversation history; it's about extracting and organizing relevant information for future use.

  2. Working Memory: Similar to human short-term memory, this allows agents to maintain context about their current task and recent actions. This is critical for maintaining coherence during complex, multi-step operations.

  3. Semantic Memory: Long-term storage of facts, patterns, and relationships that the agent has learned over time. This helps inform future decisions and strategies.

Hierarchical Planning and Execution

One of the most sophisticated patterns emerging in agent architecture is hierarchical planning. This approach breaks down complex goals into manageable subtasks, creating what amounts to a cognitive hierarchy within the agent system.

The hierarchy typically includes:

  1. Strategic Planning: High-level goal setting and strategy development
  2. Tactical Planning: Breaking strategies into specific, actionable tasks
  3. Execution Planning: Determining the specific steps needed for each task
  4. Action Execution: Carrying out planned steps and monitoring results

For example, an agent asked to "improve our website performance" might:

  • At the strategic level: decide to focus on load times and user experience
  • At the tactical level: plan to optimize JavaScript/long tasks, compress images, and improve server response times
  • At the execution level: detail specific steps for each optimization
  • At the action level: actually implement changes and measure results

Multi-Agent Systems and Collaboration

Perhaps the most fascinating pattern emerging in agent architecture is the development of multi-agent systems. These distribute cognitive load across multiple specialized agents, each handling different aspects of a complex task.

Microsoft's AutoGen and the open-source CrewAI framework exemplify this approach, allowing developers to create teams of agents that work together on complex tasks. A typical multi-agent system might include:

  • A coordinator agent managing overall task flow and delegation
  • Specialist agents with deep knowledge in specific domains
  • Critic agents that review and validate work
  • Integration agents that handle communication between other agents and external systems

In a software development context, you might have requirements agents, design agents, multiple development agents, testing agents, and documentation agents—all collaborating through complex protocols for negotiation, consensus building, and conflict resolution.

Browser Agents: Bridging AI and the Web

The emergence of browser-based AI agents represents one of the most significant developments in the agent ecosystem. These agents can interact with web interfaces just as humans do—navigating pages, filling forms, extracting information, and executing complex workflows.

Current Landscape

The browser agent ecosystem currently divides into several approaches:

Proprietary Solutions:

  • Google's Project Mariner — a tightly integrated solution built on Chrome, powered by Gemini, demonstrating the potential for browser agents to become a native part of our web experience.
  • OpenAI's Operator — focuses on general-purpose web interaction through a model called Computer-Using Agent (CUA), with advanced visual and reasoning capabilities.

Open-Source Alternatives:

  • Browser Use — enables AI agents to directly interact with web browsers, automating complex web-based tasks. It can identify and interact with web elements without manual configuration.
  • Browserbase's Open Operator — automates web tasks through natural language commands, executing actions in a headless browser environment.
  • Skyvern — leverages LLMs and computer vision to automate browser-based workflows, adapting to various web pages.

These open-source solutions offer transparency, customization, cost control, and community-driven development.

Beyond Simple Automation

What makes browser agents particularly interesting is their ability to go beyond traditional web automation. As Aaron Levie noted:

"AI agents that can launch infinite cloud browsers aren't going to be used for things we already do well with APIs. They'll be used for the long tail of tasks we never got around to connecting with APIs before."

This insight gets at the heart of why browser agents matter. They enable complex research tasks, UI testing, content management, and much more—all without specific APIs or integration points.

Technical Challenges

Browser agent implementation presents unique challenges: visual understanding (interpreting complex layouts), state management (tracking complex web app state transitions), error handling (dealing with unpredictable web interfaces), and security considerations (credential management and sensitive data handling).

Economic and Strategic Implications

The rise of browser agents has significant implications for how we think about API development (companies may prioritize human-friendly interfaces over API development), service design (interfaces may evolve to be more agent-friendly), and cost structures (the economics of automation may shift dramatically).

Real-World Applications

Software Development and Engineering

AI agents in software development are shifting from simple code completion to "AI teammates" that participate in the entire development lifecycle: code generation and review, automated test generation, documentation maintenance, and performance analysis.

Enterprise Automation

In enterprise settings, AI agents are beginning to handle the "long tail" of automation tasks: document management across multiple systems, multi-step customer service query resolution, complex workflow automation, and compliance monitoring.

Challenges and Limitations

Reliability and Consistency

The most immediate challenge facing AI agents is reliability. Unlike traditional software that follows deterministic rules, agents can exhibit unpredictable behavior:

  • Hallucinations and false confidence: Agents may make incorrect assumptions or take action based on misunderstood context.
  • Error propagation: In multi-step tasks, small errors can compound. Long-running tasks are particularly vulnerable to cascading failures.
  • Context management: Maintaining accurate state over long sequences of operations remains challenging.

Safety and Privacy Concerns

The autonomous nature of agents raises significant security and privacy issues around access control, data privacy across multiple systems, and audit/accountability trails.

Economic and Practical Limitations

Running complex agents requires significant computational resources, and per-operation costs can be high compared to traditional automation. Integration overhead, training costs, and skill requirements remain substantial.

Conclusion: The Road Ahead

AI agents represent a fundamental shift in how we interact with technology. While current implementations have limitations, we are clearly heading toward a future where autonomous AI agents become an integral part of how we work, create, and solve problems.

As Aaron Levie noted:

"The next decade will be defined by the degree to which we collaborate with AI—not just by how smart it is."

This suggests that success depends not only on technical advances but also on developing deployment and integration frameworks that unlock entirely new categories of value in "non-consumption" areas where work simply wasn't getting done before.

The age of AI agents is unfolding. The question is not whether they will change how we use technology, but how we shape that transformation.

Alpha Preview:This is an early internal build. Some chapters are still incomplete and issues may exist. Feedback is very welcome on GitHub.