Internet of Agents Task Protocol (IoA Task Protocol) for Heterogeneous Agent Collaboration

With the rapid advancement of large language models (LLMs) and multimodal autonomous agents, modern intelligent systems are increasingly constructed as collaborative networks of multiple agents. These agents are expected to work together to solve complex, open-ended tasks. However, they often differ in capabilities, tools, runtime environments, and communication patterns, leading to significant challenges in interoperability, dynamic coordination, and cross-device deployment. As a result, current multi-agent frameworks fall short of the flexibility and generality required in real-world applications. In a typical collaborative setting shown in Figure 1, agents with specialized functions—including on-device AI Agents on Device A for conceptual planning, Device B for academic search, Device C for content generation, and Device D for document analysis—must work together to complete a research paper on “Internet of Agents.” These agents are distributed across devices (e.g., laptops, edge nodes, cloud services), and each relies on different execution frameworks or data formats.

| On-Device AI |<------>| On-Device AI | | | | Agent (Device A)| | Agent (Device B)| | Agent (Device C)| | | +-----------------+ +-----------------+ +-----------------+ | | \ | / | | \ | / | | \ | / | | \ | / | | \ | / | | \ | / | | +---------------+-----------------+ | | | | | +-----------------+ | | | On-Device AI | | | | Agent (Device D)| | | +-----------------+ | | | | | +------------+-------------+ | | | | | | | IoA Server | | | | | | | +--------------------------+ | | | +---------------------------------------------------------------------------------------+ Figure 1: Multi-agent collaboration scenario ]]> When Device B encounters a specialized PDF parsing task beyond its capability, existing frameworks often fail to dynamically recruit Device D due to rigid team formation rules. Likewise, when Device A and Device C attempt to synchronize intermediate results in real time, inflexible communication channels may result in delays or dropped information. Existing solutions exhibit several key limitations: Closed frameworks that restrict integration with third-party agents such as AutoGPT or Open Interpreter; Single-device simulation that fails to reflect cross-device deployment scenarios typical in edge-cloud collaboration; Hard-coded workflows that prevent agents from switching between synchronous and asynchronous task execution at runtime. To address these challenges, this draft introduces the Internet of Agents Task Protocol (IoA Task Protocol)—a layered, extensible collaboration standard designed for intelligent multi-agent systems. The core goal of the protocol is to enable seamless collaboration among heterogeneous agents across devices, tools, and execution environments. It supports: Agent integration via a standardized interface and registration mechanism; Dynamic team formation across distributed environments; Finite-state machine-based session control for flexible and autonomous dialogue management; Structured message formats with group routing, task assignment, and response coordination. The design of the IoA Task Protocol aligns naturally with the evolution of intelligent networked systems, including fixed networks and next-generation mobile networks such as 6G, which aim to support ubiquitous intelligence through large-scale, low-latency, and semantic-driven communication. By enabling agent collaboration across fixed-network infrastructures, edge devices, mobile terminals, and cloud nodes, IoA supports coordinated intelligence across heterogeneous network environments, including both fixed networks and mobile networks such as 6G. Its structured message design, dynamic team formation, and abstracted dialogue control provide a foundational protocol framework for orchestrating intelligent services across heterogeneous network infrastructures, including fixed networks and future mobile networks such as 6G.

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in .

The following terms are defined in this draft: IoA: Internet of Agents, a protocol enabling distributed collaboration among heterogeneous agents across devices and 6G networks, defined in Agent Registry Block: A server-side module storing structured capability descriptions of all registered agents, supporting semantic search for team formation, defined in Team Formation Block: A client-side module responsible for initiating, joining, or disbanding agent teams based on task requirements, including nested sub-teams, defined in Session State Machine: A finite-state model governing collaboration states (Discussion, Synchronous Task Assignment, Asynchronous Task Assignment, Pause and Trigger, Conclusion) for adaptive dialogue management, defined in HTTP: Hypertext Transfer Protocol, a application-layer protocol for distributed, collaborative, hypermedia information systems, referenced in IoA for interoperability with web-based agents, defined in JSON-RPC: A remote procedure call protocol encoded in JSON, referenced in IoA for structured communication between web-based agents, defined in QUIC: A transport layer protocol providing secure, low-latency communication over UDP, used in IoA for real-time agent messaging, defined in

The Internet of Agents Task Protocol (IoA Task Protocol) enables distributed collaboration among heterogeneous agents through a layered architecture and distributed communication protocol. It supports seamless integration across devices, toolchains, and runtime environments. The IoA system adopts a three-layer architecture implemented symmetrically at both the server and client side: Server-side: Handles global coordination, agent discovery, group management, and message routing. Client-side: Encapsulates individual agents and provides interfaces for team collaboration and local task execution. An overview of the layered structure is shown in Figure 2.

IoA supports the integration of heterogeneous agents from diverse sources through a unified interface, including third-party agents such as AutoGPT, Open Interpreter, and embodied robotic agents. When a new agent joins the IoA, its client wrapper undergoes a registration process with the server. During this registration, the agent is expected to provide a comprehensive description of its capabilities, skills, and domains of expertise. For an agent c_i, its description is denoted as d_i, and is stored in the Agent Registry Block within the Data Layer of the server. The set of all registered agents is denoted as C = {c₁, c₂, ..., cₙ}, where each c_i is associated with its capability description d_i. This mechanism enables future semantic matching and intelligent task allocation.

Agents initiate the search process by submitting capability requirements to the Agent Query Block. The server performs semantic matching using vector similarity and returns candidate agents from the Agent Registry Block. IoA supports nested team structures. An initial group is formed for the main goal, and subgroups are recursively created if subtasks require new capabilities. This forms a hierarchical tree structure, reducing communication complexity and organizational overhead. The entire team formation process is autonomous, task-driven, device-agnostic, and self-organizing.

IoA models group conversations and collaboration using a finite-state machine with five abstract states: Discussion: Agents engage in general dialogue, exchange ideas, and clarify task require ments; Synchronous task assignment: Tasks are assigned to specific agents, pausing the group chat until completion; Asynchronous task assignment: Tasks are assigned without interrupting the ongoing discus sion; Pause & trigger: The group chat is paused, waiting for the completion of specified asyn chronous tasks; Conclusion: Marks the end of the collaboration, prompting a final summary. State transitions are managed autonomously by a coordinator agent using the conversation history and session context to determine the next state and speaker.

The agent message protocol in IoA is designed for extensibility and flexibility, enabling effective collaboration among heterogeneous agents. Each message consists of two main parts: a header and a payload. The header contains essential metadata to ensure proper routing and processing. Key fields include: sender: The unique identifier of the agent sending the message. state: The current collaboration state associated with the message. group_id: The identifier of the group chat to which the message belongs. The common header fields shared by all message types are illustrated in Figure 3.

The payload carries the main content of the message and varies depending on message type. Common fields include: message_type: Indicates the purpose of the message (e.g., discussion, task assignment, pause and trigger). next_speaker: The identifier(s) of the agent(s) expected to respond. The full structure of the message format is illustrated in Figure 4.

From an architectural perspective, the IoA Task Protocol is positioned at the application layer, built on top of transport and session protocols such as TCP, UDP, WebSocket, and QUIC. This positioning allows IoA to remain independent of underlying network technologies and enables deployment across heterogeneous networking environments, including fixed networks, edge–cloud infrastructures, and mobile networks. From the perspective of functional mapping, the corresponding relationship between IoA's three-layer architecture and the computer network layers is as follows: Interaction Layer → Maps to the application layer, responsible for high-level logic such as message protocols, group collaboration, and session state transitions. Data Layer → Spans the application layer and session layer, managing agent states, group metadata, and context tracking. Foundation Layer → Corresponds to the transport layer and system infrastructure, including secure communication channels (e.g., WebSocket/QUIC), databases, and network service modules. Since the IoA Task Protocol involves intelligent behaviors such as agent orchestration, semantic-driven interaction, and session control, an intelligence layer can be introduced above the traditional application layer. This layer encapsulates core intelligent collaboration logic—such as semantic-based agent matching, AI-driven session strategy optimization, dynamic task decomposition, and team reorganization—into standardized message formats. This layer shields upper-layer applications and lower-layer protocols from the complexity of intelligent decision-making, enabling them to focus on their core functions without concerning themselves with the details of how intelligence is implemented (e.g., scenario-specific task execution at the application layer, reliable data transmission at the transport layer). Its advantages are reflected in: standardizing the collaboration of heterogeneous agents, reducing integration costs across diverse deployment environments; improving communication efficiency through semantic compression and adaptive feature optimization; and enabling modular extensibility to support new intelligent behaviors and emerging application scenarios.

The Agent-to-Agent (A2A) protocol is a communication standard designed to support standardized, secure, and modality-agnostic interaction between AI agents. Built upon existing web technologies such as HTTP, Server-Sent Events (SSE), and JSON-RPC, A2A emphasizes default security, support for long-running tasks, and cross-modality interoperability. It introduces the concept of an AgentCard to describe agent capabilities, enabling effective discovery and invocation. The Internet of Agents (IoA) Task protocol shares the same fundamental goal with A2A: to break down communication barriers among agents and improve the overall efficiency of multi-agent systems. Both protocols rely on network communication technologies and adopt similar approaches to message encoding, decoding, and task coordination. However, the two protocols diverge significantly in terms of design philosophy and core mechanisms: A2A focuses on enabling standardized communication through web-native technologies, effectively creating a "free trade zone" for agents where interoperability is built-in. In contrast, the IoA Task Protocol draws inspiration from Internet architecture and targets the problem of ecosystem fragmentation. It establishes a system-level collaboration platform where heterogeneous agents can freely register, discover one another, and collaborate across platforms and devices. A2A is based on HTTP and JSON-RPC for communication, combined with task lifecycle management and capability discovery through AgentCard. IoA, on the other hand, offers a more comprehensive collaboration framework, including agent registration, autonomous nested team formation, finite-state-machine-driven session control, and trigger-based task coordination. While A2A is suitable for standardized task responses and streaming updates, it lacks native support for dynamic session management and nested subtask structures. IoA enables adaptive interaction flow via a session state machine, and its team_up_depth field supports recursive team formation and state transitions—making it more effective for handling complex and evolving task scenarios. In summary, A2A is well-suited for lightweight, standardized task interfaces, whereas IoA provides a more flexible and system-oriented protocol for large-scale, heterogeneous, and dynamic multi-agent collaboration. The two protocols can complement each other at different layers, jointly advancing the development of agent communication technologies.

To fully realize the potential of intelligent systems operating across heterogeneous network environments—including fixed networks and next-generation mobile networks such as 6G—the Internet of Agents Task Protocol (IoA Task Protocol) requires continuous architectural evolution and standardization. This section outlines key directions for future enhancements to improve scalability, decentralization, interoperability, and network integration.

The current IoA design relies on a centralized server model, which may limit scalability and introduce single points of failure under large-scale deployment. A promising direction is to adopt a decentralized registration and discovery mechanism, where agents can publish their capabilities to a shared registry accessible via a network-accessible web-based interface. Inspired by Domain Name System (DNS) and search engines, agents could be discoverable through keyword-based or semantic search at scale, enabling lightweight browser-based or API-based discovery across domains. This decentralized lookup layer would allow IoA to support scenarios where agents operate across multiple domains, owners, and physical networks, while still maintaining secure and authenticated interaction through digital signatures and trust mechanisms.

To scale beyond millions of agents, the IoA Task Protocol should adopt sharding and region-based message routing. Distributed registries and dynamic load balancing can reduce latency and avoid bottlenecks. Caching of frequent agent metadata at edge nodes is also critical for fast retrieval in latency-sensitive deployment scenarios.

In highly heterogeneous environments, agents may describe their capabilities using different terminologies. To address this, the IoA Task Protocol should support ontology mapping and alignment mechanisms. This allows agents with differing skill descriptors to still interoperate, using shared or translated task definitions during team formation and dialogue.

For mission-critical 6G scenarios (e.g., autonomous vehicles, medical AI), the protocol must incorporate stronger security primitives. This includes: End-to-end encryption with forward secrecy. Support for zero-trust architectures with agent attestation and secure enclaves. Fine-grained access control based on agent role and session context.

IoA servers and agents store sensitive data including capability descriptors, session state metadata, and task execution logs, which consume memory and computational resources. To mitigate risks of resource exhaustion and unauthorized access, (OAuth 2.0) mandates that IoA entities must authenticate peers via token-based validation before processing registration requests or collaboration messages. Additionally, all data transmission between entities must use TLS 1.3 as specified in to ensure confidentiality and integrity, preventing eavesdropping or tampering.

[TBD] This document defines a new protocol for heterogeneous agent collaboration: the Internet of Agents Task Protocol (IoA Task Protocol). The protocol's code point allocation will be determined in subsequent revisions as the standard matures, in accordance with IANA's relevant registration procedures.

Thanks Weize Chen, Ziming You, Ran Li, Yitong Guan, Chen Qian, Chenyang Zhao, Ruobing Xie, Maosong Sun and Yu Hao for their valuable comments on this draft.