February 2026
The operational challenges of multi-agent AI systems—failure handling, change coordination, knowledge capture, capacity planning, quality assurance across interdependent autonomous actors—are not novel. They are the defining challenges of enterprise service management, codified over three decades in frameworks such as ITIL 4 and COBIT. Yet the AI agent community has largely treated these challenges as unprecedented, building ad hoc operational mechanisms or leaving them to human operators entirely, even as multi-agent patterns move from research prototypes into production enterprise workflows. Recent work has begun systematically addressing multi-agent operational governance, arriving at structures that closely parallel established IT Service Management (ITSM) practices without recognizing the lineage. This convergence is not coincidental: the problem is the same. What differs is the substrate.
We argue that ITIL 4’s 34 management practices provide the complete specification for a multi-agent system’s autonomic self-governance layer—not as an external governance overlay imposed by humans, but as the internal runtime processes executed by dedicated process agents that manage the system itself. AI-native operational concerns such as hallucination detection, prompt drift, and model capability regression map naturally onto existing practices (Monitoring & Event Management, Problem Management, Supplier Management, Change Enablement) because ITIL specifies what must be managed, not how the managed entities work internally. We present HIVE (Highly-concurrent Intelligent Virtualized Engine), a framework that realizes this inversion by decomposing every ITIL practice into two components: persistent infrastructure and process agents that operate that infrastructure. We provide a governance-tiered mapping of all 34 practices—25 fully autonomous and 9 advisory—and demonstrate an end-to-end self-healing lifecycle in which process agents diagnose, remediate, and learn from a workflow failure without human intervention, while humans retain authority only over inherently strategic decisions.
Keywords: multi-agent systems, autonomic computing, ITIL 4, COBIT, self-governing agents, self-healing systems, CMDB, agent orchestration, process agents, knowledge management, LLM agents, AI-as-a-Service
Multi-agent AI systems have advanced rapidly in their ability to perform user-requested tasks: analyzing data, generating code, writing reports, orchestrating complex workflows. Early frameworks such as AutoGen [1], LangChain [2], CrewAI [3], and OpenAI Swarm [4] demonstrated sophisticated collaboration patterns. These patterns have since been adopted and productized by frontier model providers themselves—Anthropic’s Claude Code implements agentic subagent delegation natively; OpenAI’s Codex and operator products bring multi-agent collaboration into enterprise team workflows—moving multi-agent orchestration from research prototypes into production deployment. Yet across this entire progression, from research frameworks to first-party products, these systems share a fundamental limitation: they cannot manage themselves.
When a workflow fails, a human must investigate. When an agent’s performance degrades, a human must diagnose. When a tool’s API changes, a human must update all affected configurations. When a new agent capability is developed, a human must test, validate, and deploy it. The agents, for all their sophistication at performing business tasks, have no capacity for operational tasks.
This gap is not merely an inconvenience—it is the primary operational barrier to autonomous AI system operation at scale. A system that can process 1,000 concurrent workflows but requires human intervention for every operational issue does not scale. The bottleneck is not compute or intelligence—it is the absence of self-management.
But the operational complexity that multi-agent systems introduce is not unprecedented. As AI agents increasingly function as autonomous participants in enterprise workflows—performing knowledge work that was previously done by human teams—they create exactly the kind of complex, failure-prone, multi-actor operational environment that enterprises have managed for decades. Detecting failures, coordinating changes, preserving institutional knowledge, maintaining quality across distributed autonomous participants: these are the defining challenges of enterprise service management, addressed by mature disciplines refined through three decades of practice at global scale. The question is not whether multi-agent systems need operational management, but whether we recognize the problem as one that has already been solved—and build on that foundation rather than re-derive it from first principles.
We observe that the operational tasks currently performed by human operators of multi-agent systems—failure detection, root cause analysis, change management, knowledge capture, capacity planning, configuration tracking—are precisely the functions codified by one discipline in particular: IT Service Management (ITSM).
ITIL 4 [5]—the most widely adopted operational management framework in enterprise IT, implemented by the majority of large organizations globally—defines 34 management practices organized into three categories: General Management (14), Service Management (17), and Technical Management (3). Each practice specifies what operational function must be performed, how it interacts with other practices, what metrics govern its quality, and what governance structures ensure accountability. Critically, ITIL 4 is deliberately implementation-agnostic: it specifies operational functions without prescribing the methods or substrates by which they are performed. This agnosticism is what enables its application to AI agent systems—a substrate that did not exist when ITIL was designed. AI-specific operational concerns such as hallucination detection, prompt drift, model capability regression, and sycophancy monitoring map naturally to existing ITIL practices (Monitoring & Event Management, Problem Management, Supplier Management, Change Enablement) precisely because ITIL specifies what must be managed, not how the managed entity works internally (see Section 5 for the complete 34-practice mapping).
Our central thesis is:
ITIL 4’s 34 management practices, when implemented as agent-operated processes rather than human-operated processes, provide the complete specification for a multi-agent system’s autonomic self-governance layer.
In existing multi-agent frameworks, ITSM governance—to the extent it exists at all—is an external overlay managed by human operators. Agents are the objects being governed. Our thesis inverts this relationship: the agents are not merely the subject of ITSM governance—they are its executors. The multi-agent system runs ITSM as its own operating system, for itself, by itself.
This paper makes the following contributions:
The Self-Serving ITIL Architecture (conceptual model): The thesis that ITIL 4 practices can and should serve as the multi-agent system’s internal self-governance runtime—executed by dedicated process agents as an endogenous capability rather than imposed as an external human governance overlay.
The Two-Layer Decomposition (engineering methodology): A systematic method for making each ITIL 4 practice implementable by decomposing it into (a) infrastructure tooling that supports the practice and (b) process agents that execute it, establishing the complete requirements for a self-managing agent ecosystem.
Governance-Tiered 34-Practice Mapping: A comprehensive mapping of every ITIL 4 management practice, classified into 25 autonomous practices (fully agent-operated) and 9 advisory practices (agent-supported, human-governed), with infrastructure, process agents, and behaviors specified for each.
The HIVE Framework: A four-layer architecture (Presentation, Control Plane, State & Memory, Execution Plane) that provides the structural foundation for the self-serving ITIL model, with six Team Orchestration Patterns mapped to ITIL 4 practices.1
We validate these contributions through a detailed end-to-end Self-Healing Lifecycle Demonstration (Section 6), in which nine ITIL practices execute autonomously in sequence to diagnose, remediate, and learn from a system failure without human intervention.
The self-serving ITIL architecture has a direct intellectual lineage to IBM’s Autonomic Computing vision [6], which proposed self-managing systems with four properties: self-configuration, self-healing, self-optimization, and self-protection. The autonomic computing program organized these around the MAPE-K control loop (Monitor, Analyze, Plan, Execute, Knowledge)—five functions that map directly to HIVE’s process agents (EventMonitor monitors, ProblemInvestigator analyzes, ChangeAdvisory plans, ReleaseManager executes, KnowledgeCurator manages knowledge). But where autonomic computing specified these as abstract control loops, HIVE implements them as concrete ITIL practices executed by dedicated agents. The autonomic computing program identified the need but lacked both the AI capabilities and the operational framework to fully realize it. Two decades later, LLM-powered agents provide the intelligence, and ITIL 4 provides the operational specification. HIVE connects them.
HIVE draws on three bodies of work that have developed independently: multi-agent frameworks, which define the systems that need self-management (Section 2.1); operational management disciplines and autonomic computing, which define the management processes themselves and the vision for self-managing systems (Sections 2.2–2.3); and architectural patterns from data engineering, which inform HIVE’s approach to contract-first agent design (Section 2.4). We review each to identify what HIVE inherits and what it adds.
The multi-agent systems community has made significant advances in agent collaboration, tool use, and workflow orchestration. Our observation concerns a dimension these frameworks do not address: operational self-management.
AutoGen [1] introduced conversational multi-agent patterns and adopted an actor model runtime in v0.4. However, AutoGen provides no self-management layer—when agents fail, the runtime has no mechanism to diagnose or recover.
LangChain/LangGraph [2] provides graph-based workflow abstractions. Workflow errors produce exceptions; no agent investigates root causes or captures learnings.
CrewAI [3] offers role-based agent definition. It provides accessible abstractions but has no concept of agents managing other agents’ operational health.
OpenAI Swarm [4] provides lightweight handoff patterns. It is explicitly educational and does not address operational concerns.
To our knowledge, none of these frameworks implement any mechanism for agents to manage their own operational lifecycle. The gap is striking: there are no incident agents, no knowledge agents, no change management agents, no CMDB agents in any existing multi-agent framework.
Recent work has begun to address this gap. Google DeepMind’s framework for multi-agent delegation [10] systematically examines task decomposition, assignment, monitoring, trust calibration, and permission handling for agent ecosystems—arriving at structures that closely parallel established ITSM practices. Their “Dynamic Assessment” maps to ITIL’s Capacity & Performance Management; their “Adaptive Execution” recapitulates Incident Management escalation paths combined with Problem Management workarounds; their monitoring taxonomy mirrors ITIL’s event categories; their trust and reputation system parallels Supplier Management. Yet the paper cites zero operational management literature—no ITIL, COBIT, TOGAF, SIAM, or ISO/IEC 20000. This convergence underscores our central argument: when researchers think systematically about multi-agent operational governance, they arrive at ITIL-shaped conclusions. Recognizing the lineage enables building on three decades of refinement rather than re-deriving from first principles.
IBM’s Autonomic Computing initiative [6] proposed self-managing systems organized around a MAPE-K control loop: sensors feed a Monitor component that detects events, an Analyze component that correlates and diagnoses, a Plan component that determines corrective action, and an Execute component that applies changes—all sharing a persistent Knowledge base. The architecture was implemented in IBM’s autonomic computing toolkit through policy-based management: administrators defined condition-action rules (“if CPU exceeds 90%, provision additional capacity”), and autonomic managers executed them. The approach worked for well-characterized, bounded operational scenarios but could not handle novel failures, ambiguous diagnoses, or situations requiring judgment across multiple interacting concerns—precisely the open-ended reasoning that ITIL practices demand when applied to complex service ecosystems.
LLM-powered agents now provide the flexible reasoning capability that autonomic computing assumed but could not deliver. What remained missing was the operational framework specifying what processes these intelligent agents should run and how those processes coordinate. HIVE supplies this by implementing ITIL 4 practices as the operational specification that the MAPE-K architecture left abstract.
ITIL 4 [5] organizes operational management around the Service Value System (SVS), a model in which demand enters the system and is transformed into value through a chain of activities: Engage, Design & Transition, Obtain/Build, Deliver & Support. These activities are supported by 34 management practices—modular sets of organizational resources designed for performing work or accomplishing a specific objective. A “practice” in ITIL terms is not a procedure or a checklist; it is a coherent capability comprising people, tools, processes, and information that together accomplish an operational function.
The 34 practices are organized into three categories: General Management Practices (14) address functions that span the organization (strategy, risk, knowledge, continual improvement); Service Management Practices (17) address functions specific to service delivery (incident management, change enablement, service desk, configuration management); and Technical Management Practices (3) address infrastructure and platform concerns (deployment, infrastructure management, software development). Practices do not operate in isolation—they interact through value streams, sequences of activities triggered by demand that coordinate multiple practices to deliver a specific outcome. An incident, for example, may trigger Incident Management, which queries Configuration Management, escalates through Problem Management, and proposes changes through Change Enablement—a multi-practice value stream.
Critically, ITIL 4 is deliberately implementation-agnostic: it specifies operational functions and interactions without prescribing who performs the work or how. This agnosticism is what enables our key insight: the “who” can be agents.
COBIT 2019 [7] provides governance structures with explicit separation of governance (evaluate, direct, monitor) from management (plan, build, run, monitor). This maps to HIVE’s separation between human-supervised escalation thresholds (governance) and agent-operated routine processes (management).
Beyond operational management, HIVE draws on architectural principles from data engineering that inform its approach to agent contract design.
Netflix’s Unified Data Architecture (UDA) [8] introduced “model once, represent everywhere”—domain concepts defined once and projected into multiple downstream representations. HIVE applies an analogous principle through DSPy [9], a framework for optimizing language model pipelines through declarative signatures. In HIVE, agent I/O contracts are defined once as DSPy Signature type specifications and projected into appropriate serialization formats at runtime via Adapters. Both Netflix UDA and HIVE’s Adapter pattern decouple semantic definition from implementation format, enabling the same agent contract to be rendered as a structured prompt, a function signature, or a validation schema depending on context.
The DSPy ecosystem has continued to evolve in directions that are architecturally resonant with HIVE. Recent work on Recursive Language Models (RLMs) [11] enables models to programmatically decompose and recursively invoke themselves over inputs far exceeding their context windows. Early practitioner reception has noted that RLMs formalize patterns production teams have already been building: recursive self-delegation, mutable prompts, persistent state across tool calls, and subtask farming to sub-agents. This convergence is instructive—RLMs formalize at the model layer what HIVE formalizes at the governance layer, and the two are complementary rather than competing. RLMs provide a mechanism for individual agents to manage their own cognitive complexity; HIVE provides the organizational structure that coordinates those agents into a self-governing system. Process agents such as KnowledgeCurator and ProblemInvestigator, which must reason over large operational histories, are natural candidates for RLM-style recursive decomposition within HIVE’s governance envelope.
HIVE’s architecture must satisfy two requirements: the standard cloud-native properties needed for production deployment at scale, and the novel self-management capability that is the paper’s central contribution. We describe the architecture that satisfies both.
HIVE is founded on four architectural principles: (1) decouple control from execution, (2) externalize all state, (3) embrace ephemeral execution, and (4) agents manage agents. The first three are established cloud-native patterns that HIVE adopts; the fourth is the paper’s distinctive architectural contribution.
Principle 1: Decouple Control from Execution. Management logic (scheduling, policy enforcement, routing) is physically separated from execution logic (LLM inference, tool invocation). This mirrors COBIT’s governance/management distinction and enables independent scaling.
Principle 2: Externalize All State. Execution environments are stateless. All context resides in a persistent, shared storage layer, enabling the “Resume Anywhere” capability: workflows survive process failures.
Principle 3: Embrace Ephemeral Execution. Execution environments are disposable resources provisioned on-demand and terminated upon completion, enabling scale-to-zero cost efficiency.
Principle 4: Agents Manage Agents. Operational management functions are not performed by human operators or external systems—they are performed by dedicated process agents running ITIL practices as their procedures. This principle is what distinguishes HIVE from standard multi-agent architectures and is the structural expression of the self-serving ITIL model described in Section 1.
The critical addition over standard multi-agent architectures (such as those described in Section 2.1) is twofold. First, the State & Memory Layer contains not only business data stores (PostgreSQL, Qdrant) but also the ITSM infrastructure (Incident Database, CMDB, Knowledge Base, Change Request System). Second, the Execution Plane hosts not only business workflow agents but also the process agents that run ITIL practices.
HIVE distinguishes two classes of agents: Business Workflow Agents, which perform user-requested work, and Process Agents, which manage the operational health of the system itself.
Business Workflow Agents perform the tasks that users request: data analysis, code generation, requirements gathering, design review. These are the agents that existing frameworks focus on exclusively.
Process Agents execute ITIL 4 practices as their procedures: IncidentManager, CMDBManager, KnowledgeCurator, ChangeAdvisory, ProblemInvestigator. These agents exist not to serve users directly but to keep the system running.
The relationship between these two classes is that of service consumer to service provider—a mapping that directly reflects ITIL’s service relationship model. When something goes wrong, business agents do not diagnose themselves; dedicated process agents handle the investigation, much as a help desk handles tickets for the employees it supports. When a DataAnalyzer agent fails, the IncidentManager creates an incident record, classifies severity, and assigns remediation. When a CodeGenerator agent needs context from a previous project, it does not search memory directly—the KnowledgeCurator has already indexed and surfaced relevant knowledge.
Synchronous REST: Client-to-backend requests for workflow submission and status.
Asynchronous Messaging (RabbitMQ AMQP): Event dispatch between Control Plane and Execution Plane; also used for process agent event consumption (e.g., IncidentManager subscribes to WORKFLOW_EXECUTION_ERROR events).
Real-time WebSocket: Live streaming to the Presentation Layer for observability and HITL escalation.
The HIVE Bridge sits at the boundary between the Control Plane and the Execution Plane, translating between HIVE’s domain-specific event vocabulary and the underlying agent runtime’s message types. The current implementation targets AutoGen 0.4, mapping 11 HIVE event types to corresponding AutoGen messages. However, the Bridge is designed as a runtime-agnostic abstraction layer: a different Bridge implementation could target LangGraph, CrewAI, or any agent runtime that supports message-based communication. This is the point of the abstraction—HIVE’s orchestration logic is decoupled from any specific execution runtime. The HIVEAdapter connects the runtime to infrastructure: RabbitMQ for events, PostgreSQL for persistence, OpenTelemetry for tracing, and Prometheus for metrics.
Both business workflow agents and process agents must collaborate to accomplish their tasks. HIVE defines six Team Orchestration Patterns, each mapped to the ITIL 4 practice it most naturally supports. The mappings below are default recommendations—a given practice may use alternative patterns depending on context—but each mapping reflects a structural correspondence between the orchestration pattern’s communication topology and the practice’s operational dynamics. We describe each pattern, explain why it maps to its practice, and show how process agents use it. Section 6 demonstrates how patterns compose in a real operational scenario.
Pattern: Linear handoff (Agent A -> Agent B -> Agent C).
ITIL Mapping: Request Fulfillment procedures are by definition standard, repeatable, and deterministic—each step’s output is the next step’s input, with no branching or negotiation required. The Sequential pattern’s linear topology matches this directly.
Use by Process Agents: The FulfillmentAgent executes standard service requests via sequential pipeline: validate inputs, resolve dependencies, execute, confirm completion.
Pattern: Cyclic discussion among peers until consensus or termination condition.
ITIL Mapping: Root cause analysis benefits from cycling through diverse diagnostic perspectives without a predetermined hierarchy—each participant contributes a different vantage point (the affected agent, the configuration system, a generic diagnostician), and the cyclic structure ensures all perspectives are heard before convergence. This mirrors blameless postmortem practices in incident analysis, where round-table discussion among diverse roles is preferred over top-down investigation.
Use by Process Agents: The ProblemInvestigator spawns Round Robin diagnostic teams to investigate recurring failures, with convergence criteria based on evidence agreement across participants.
Pattern: Directed graph with conditional transitions and escalation paths.
ITIL Mapping: Incident Management is inherently a state-transition process: an incident moves through detection, classification, assignment, investigation, escalation (if needed), resolution, and closure. Each transition is conditional on the outcome of the previous state—severity determines the escalation path, diagnosis determines the remediation approach. The State Machine pattern’s conditional branching and explicit escalation paths map directly to this lifecycle.
Use by Process Agents: The IncidentManager uses state machine flows for severity classification, assignment, escalation, and resolution tracking. Each state transition is logged, enabling audit trails and SLA compliance monitoring.
Pattern: Manager agent decomposes and delegates to specialist sub-teams.
ITIL Mapping: Change Advisory Boards require structured delegation—a coordinating authority decomposes a change proposal into review dimensions (security, architecture, testing) and delegates each to the appropriate specialist, then synthesizes their assessments into an approval or rejection decision. The Hierarchical pattern’s manager-specialist topology maps directly to this delegation structure.
Use by Process Agents: The ChangeAdvisory agent convenes virtual CABs—delegating review to SecurityAuditor, ArchitectureGuardian, and ValidationTester, synthesizing their assessments, and issuing an approval or rejection with documented rationale.
Pattern: Multiple agents executing simultaneously for variety or throughput.
ITIL Mapping: Service Validation requires testing candidate configurations under identical conditions to compare outcomes—a fundamentally parallel operation where independence between test executions is essential to avoid contamination.
Use by Process Agents: The ValidationTester runs parallel A/B tests on candidate configurations, applies quality metrics to each result independently, and certifies the candidate that meets baseline thresholds.
Pattern: Doer/Critic pair iterating until quality threshold met.
ITIL Mapping: Continual Improvement follows a Plan-Do-Check-Act cycle—an inherently iterative process where each cycle’s output is evaluated against quality criteria and fed back as input to the next cycle. The Loop pattern’s Doer/Critic structure maps to this directly: the Doer proposes a change, the Critic evaluates it against metrics, and the loop continues until a quality threshold is met or a maximum iteration count is reached.
Use by Process Agents: The ImprovementCoach uses a pluggable optimizer architecture to iteratively refine agent prompts and pipeline configurations against quality metrics. Current implementations use GEPA [12], a reflective prompt evolution method that uses natural language reflection on execution trajectories rather than scalar reward signals, and GRPO for reinforcement-based optimization. The optimizer slot is designed to accommodate emerging approaches—including Recursive Language Models [11] and future DSPy-compatible optimizers—as the meta-optimization landscape evolves.
In practice, operational scenarios require multiple patterns working in concert. An Incident Management state machine (4.3) may spawn a Round Robin diagnostic team (4.2) at its “investigation” state; the diagnostic team’s root cause finding may trigger a Hierarchical CAB (4.4) to approve a proposed fix; the CAB’s approval may initiate a Parallel validation test (4.5) before deployment. This multi-pattern composition is not accidental—it reflects how ITIL practices interact through value streams, with the output of one practice triggering the next. Section 6 demonstrates this composition through a complete self-healing lifecycle.
Having established HIVE’s architectural layers (Section 3) and orchestration patterns (Section 4), we now turn to the paper’s primary contribution: the concrete decomposition of all 34 ITIL 4 management practices into agent-executable runtime processes. Each practice is realized through two components—persistent infrastructure and a dedicated process agent—classified into governance tiers that define the boundary between agent autonomy and human oversight. Sections 5.3–5.5 present the complete mapping; Section 6 then traces a worked example through the resulting machinery.
Each ITIL 4 practice is decomposed into two components:
Infrastructure Layer: The persistent tools, data stores, and APIs that support the practice. These are the “nouns”—the CMDB, the incident database, the knowledge base, the change request system, the service catalog, the risk register.
Process Agent Layer: The dedicated agents that execute the practice. These are the “verbs”—the agents that create incidents, query the CMDB, capture knowledge, convene change advisory boards, run optimization loops.
This decomposition mirrors ITIL 4’s own distinction between practice capabilities—the tools, information, and organizational resources that must exist for a practice to function—and practice activities—the procedures that practitioners execute using those capabilities. The infrastructure layer instantiates the former; the process agent layer automates the latter. Separating them also yields a standard engineering benefit: persistent state can be versioned, backed up, and audited independently of the agents that operate on it, and process agent implementations can be replaced or upgraded without disturbing the data they depend on.
Both layers must be designed and implemented for each practice. Infrastructure without process agents is a passive data store that no one consults. Process agents without infrastructure have nowhere to read from or write to.
Not all 34 practices translate equally to autonomous agent operation. We classify them into two governance tiers:
Tier 1 — Autonomous (25 practices): Practices where agents execute the full process lifecycle without human decision authority. These are inherently operational practices: detecting failures, diagnosing root causes, managing configurations, deploying changes, capturing knowledge, enforcing security, monitoring SLAs. The decisions involved are technical, bounded, and reversible.
Tier 2 — Advisory (9 practices): Practices where agents provide infrastructure, analytics, and recommendations, but humans retain decision authority. These are inherently strategic or organizational practices: setting capability investment priorities, retiring services, designing architectures, managing stakeholder relationships, governing budgets. The decisions involved are business-oriented, high-impact, and often irreversible.
This two-tier structure instantiates what COBIT 2019 formalizes as the separation of governance (evaluate–direct–monitor) from management (plan–build–deliver–support) [7]: the Advisory tier retains human governance authority over strategic direction, while the Autonomous tier delegates management execution to agents. The architecture specifies 25 practices for fully autonomous operation—meaning that, when realized, the operational burden on human operators reduces to the 9 practices whose decisions are inherently strategic, high-impact, or irreversible. For those 9, agents still eliminate the analytical overhead (data gathering, trend identification, impact assessment) and present humans with actionable recommendations rather than raw data. A StrategyAdvisor agent that surfaces “Agent X has 3% utilization and 40% failure rate—recommend retirement” is qualitatively different from a human manually querying dashboards.
Some practices contain both operational and strategic components. Service Financial Management, for example, includes operational token budget enforcement (autonomous) and strategic budget allocation decisions (advisory). The governance tier reflects the predominant decision mode—the component requiring human judgment.
The governance tier also represents a maturity gradient: as confidence in agent decision-making grows through operational track records, practices may migrate from Advisory to Autonomous (see Section 8).
The tables in Sections 5.3–5.5 present the complete mapping of all 34 ITIL 4 practices across the three standard practice categories (General Management, Service Management, Technical Management). For each practice, we identify the infrastructure required, the process agent responsible, and the self-serving behavior that agent performs. Advisory-tier practices are marked with ; unmarked practices are Autonomous. Practices shown in bold are those exercised in the worked example of Section 6, where we trace a complete operational lifecycle through the architecture.
These mappings are not arbitrary naming conventions. Each process agent inherits the operational semantics, inter-practice dependencies, and escalation paths that ITIL 4 defines for the corresponding practice. An IncidentManager agent that escalates to a ProblemInvestigator after detecting recurrence is not following a bespoke rule—it is executing the incident-to-problem escalation path that ITIL has codified across three decades of operational practice. The value of adopting ITIL’s vocabulary and process structure is precisely that these relationships do not need to be reinvented.
|
ITIL 4 Practice |
Infrastructure Required |
Process Agent |
Self-Serving Behavior |
|---|---|---|---|
| Strategy Management | Capability roadmap store, model benchmarks | StrategyAdvisor | Analyzes workflow performance history and surfaces capability investment recommendations to human operators; evaluates build-vs-buy tradeoffs for new tools |
| Portfolio Management | Service Catalog with lifecycle states and usage analytics | PortfolioManager | Identifies underperforming services (low usage, high failure rate) and recommends retirement to human operators; surfaces promotion candidates with supporting data |
| Architecture Management | Architecture Decision Records, dependency graph | ArchitectureGuardian | Validates new agent compositions against known constraints (circular dependencies, excessive coupling); flags violations for human architectural review |
| Service Financial Management | Token usage ledger, compute cost tracker, budget tables | CostController | Monitors real-time token spend per workflow; enforces pre-set budget limits (operational); recommends budget allocations and cost optimization strategies to human operators (strategic) |
| Workforce & Talent Management | Agent Registry with capability profiles and performance scores | TalentManager | Tracks agent utilization and performance metrics; surfaces capability gaps and retraining recommendations to human operators (“no agent handles legal compliance review”) |
| Continual Improvement | Improvement backlog, experiment results database | ImprovementCoach | Runs GEPA/GRPO optimization loops; proposes prompt refinements; creates improvement tickets from post-mortems |
| Measurement & Reporting | Prometheus metrics, Grafana dashboards, action_history | ReportingAnalyst | Generates periodic performance reports; detects SLA violation trends; auto-creates dashboards for new service categories |
| Risk Management | Risk register, hallucination detection models, policy rules | RiskAssessor | Evaluates every workflow submission for risk (sensitive data, high cost, novel agent combinations); flags high-risk executions for human review |
| Information Security | Access control lists, encryption configs, audit logs | SecurityAuditor | Scans for access policy violations; detects anomalous agent behavior (unexpected tool calls, data exfiltration patterns); enforces data classification |
| Knowledge Management | OpenMemory (Qdrant vector DB), knowledge graph, embedding pipeline | KnowledgeCurator | Captures learnings from every execution; merges duplicates; surfaces relevant context to business agents via RAG; prunes stale knowledge |
| Organizational Change Mgmt | Change impact assessments, adoption tracking | ChangeAdvisor | Monitors adoption rates and generates impact assessments when capabilities change; recommends communication and training plans to human operators |
| Project Management | Kanban board, sprint backlogs, milestone tracker | ProjectCoordinator | Tracks multi-phase improvement initiatives and surfaces status, risks, and dependency conflicts to human project owners |
| Relationship Management | Stakeholder registry, satisfaction signals, feedback pipeline | RelationshipManager | Monitors user satisfaction signals (repeated failures, workflow abandonment); surfaces relationship risk indicators to human account owners |
| Supplier Management | Provider health dashboards, API rate limit monitors, cost comparators | SupplierMonitor | Monitors LLM provider uptime and latency; triggers failover on degradation; benchmarks new model releases against current performance |
|
ITIL 4 Practice |
Infrastructure Required |
Process Agent |
Self-Serving Behavior |
|---|---|---|---|
| Business Analysis | Requirements templates, domain ontology | RequirementsAnalyst | Decomposes user requests into structured requirements; identifies ambiguity; maps to existing catalog before creating new services |
| Service Catalogue Management | Service Catalog database, input/output schema registry | CatalogManager | Maintains service definition accuracy; auto-generates catalog entries when new workflow patterns stabilize |
| Service Design | Design pattern library, team pattern templates | ServiceDesigner | Proposes agent team compositions and pattern selections based on requirements analysis; presents design options with tradeoffs to human architects for approval |
| Service Level Management | SLA definitions, compliance tracking | SLAMonitor | Monitors workflow completion against targets; creates incidents on SLA breach; recommends SLA adjustments from actual data |
| Availability Management | Health check endpoints, failover configs | AvailabilityManager | Monitors agent and infrastructure availability; triggers auto-scaling; performs failover for unhealthy nodes |
| Capacity & Performance | Queue depth monitors, resource utilization metrics | CapacityPlanner | Predicts demand spikes from history; pre-provisions capacity; identifies bottlenecks (“Qdrant latency up 40% due to collection growth”) |
| Service Continuity | State snapshots, backup configs, recovery procedures | ContinuityGuardian | Validates checkpoint-recovery capability; tests resume-from-state; maintains DR runbooks |
| Monitoring & Event Mgmt | Event bus, alert rules, correlation engine | EventMonitor | Correlates events across workflows to detect systemic issues; distinguishes false alarms; auto-creates incidents from critical patterns |
| Service Desk | Request intake forms, routing rules | ServiceDeskRouter | Receives user requests; routes to appropriate catalog entry; triages (“new request or follow-up?”) |
| Incident Management | Incident database, escalation matrix, notification channels | IncidentManager | Creates incident records when workflows fail; classifies severity; assigns to remediation teams (which may be other agents); tracks resolution; captures fix as knowledge article |
| Problem Management | Problem database, root cause analysis templates, Known Error DB | ProblemInvestigator | Identifies recurring incidents (e.g., “same tool failed 5 times this week”); creates Problem Record; spawns Round Robin diagnostic team; documents root cause in Known Error DB |
| Change Enablement | Change request database, approval workflow engine | ChangeAdvisory | Before any agent definition, tool, or workflow is modified in production, convenes virtual CAB (Hierarchical Team of SecurityAuditor, ArchitectureGuardian, ValidationTester); evaluates risk; approves or rejects |
| Release Management | Release pipeline, canary deployment configs | ReleaseManager | Orchestrates promotion (Sandbox -> Shadow -> Production); monitors canary metrics; triggers rollback on regression |
| Service Validation & Testing | Test harness, golden datasets, quality benchmarks | ValidationTester | Runs Parallel Team A/B tests on candidates; applies GEPA quality metrics; certifies new versions meet baseline |
| Configuration Management | CMDB (Configuration Management Database) | CMDBManager | Maintains live registry of every agent, tool, model, data source, and integration—their versions, dependencies, and relationships. Answers “what tools can do web scraping?” and “what agents are affected by this tool API change?” |
| IT Asset Management | Asset inventory, license tracker, deprecation calendar | AssetTracker | Tracks trained model versions, optimized prompt sets, tool libraries; flags end-of-life assets; ensures license compliance |
| Service Request Management | Request fulfillment workflows, standard procedures | FulfillmentAgent | Executes standard service requests via Sequential Team pattern; validates inputs; generates WorkflowID |
|
ITIL 4 Practice |
Infrastructure Required |
Process Agent |
Self-Serving Behavior |
|---|---|---|---|
| Deployment Management | Container registry, Helm charts, deployment pipelines | DeploymentAgent | Manages containerized agent deployments; executes blue-green/canary strategies; validates health post-deploy |
| Infrastructure & Platform | Kubernetes cluster, database instances, message broker | InfrastructureMonitor | Monitors database connections, queue depths, vector store capacity; triggers scaling; alerts on degradation |
| Software Dev & Management | Code repositories, CI/CD pipelines, test suites | DeveloperAgent | Generates and reviews agent code via synthesis pipeline; runs automated tests; maintains quality standards |
Taken together, the 34 mappings above specify 25 practices for fully autonomous agent operation and 9 for human-in-the-loop advisory mode. The autonomous practices cluster around operational concerns—detection, diagnosis, remediation, deployment, knowledge capture, and optimization—while the advisory practices cluster around strategic concerns: investment prioritization, architectural governance, organizational change, and stakeholder relationships. This distribution is not incidental; it reflects the nature of the decisions involved, and it defines a governance boundary that an organization can audit, adjust, and evolve as confidence in agent decision-making matures (Section 8).
To demonstrate how the architecture from Section 5 operates in practice, we trace a complete operational lifecycle triggered by a routine workflow failure. The scenario exercises nine ITIL practices in sequence, three team orchestration patterns, and the governance boundary between autonomous and advisory tiers—illustrating the self-serving property concretely rather than abstractly.
A business workflow agent (“DataAnalyzer”) fails while processing a user request due to a malformed API response from an external data tool (“MarketDataAPI”).
Figure 2 illustrates the practice-to-practice flow, including the decision forks that determine whether the lifecycle short-circuits (known error resolution), escalates to human oversight (severity-1 or CAB rejection), or proceeds through the full autonomous sequence.
Step 1 — Monitoring & Event Management The
EventMonitor agent detects the WORKFLOW_EXECUTION_ERROR event
on the message bus. It correlates this event with two similar failures
from the past 48 hours, all involving MarketDataAPI. It publishes a
correlated alert: “Cluster of 3 failures from MarketDataAPI in 48h.”
Step 2 — Incident Management The IncidentManager agent
consumes the correlated alert. It creates an incident record
(INC-2026-0847), classifies severity as 2 (service degraded,
not down), and checks the Known Error Database for existing resolutions.
None found.
Step 3 — CMDB Lookup The IncidentManager queries the CMDBManager: “What is MarketDataAPI? What version? What agents depend on it?” The CMDBManager returns: MarketDataAPI v2.3, used by DataAnalyzer, FinancialReporter, and TrendPredictor. Last configuration update: 6 days ago.
Step 4 — Problem Management The IncidentManager escalates
to the ProblemInvestigator because this is the third incident in 48 hours
for the same component. The ProblemInvestigator creates a Problem Record
(PRB-2026-0203) and spawns a
Round Robin diagnostic team: DataAnalyzer (as domain
expert), CMDBManager (for dependency context), and a generic
DiagnosticAgent (for API testing).
Step 5 — Root Cause Identification The diagnostic team
identifies that MarketDataAPI changed its response schema 2 days ago (a
new metadata wrapper around all responses). The adapter in
HIVE’s tool registry expects the old format. Root cause: schema drift in
external dependency. The ProblemInvestigator records this in the Known
Error Database: “MarketDataAPI v2.4 wraps responses in
metadata envelope. Tool adapter v2.3 does not handle this.”
Step 6 — Change Enablement The ProblemInvestigator proposes a fix: update the MarketDataAPI adapter to handle both the old and new response formats. The ChangeAdvisory agent convenes a virtual CAB (Hierarchical Team):
SecurityAuditor validates the adapter change introduces no security vulnerabilities.
ArchitectureGuardian confirms the change does not violate architectural constraints.
ValidationTester runs the updated adapter against a golden dataset of MarketDataAPI responses.
All three approve. The change is authorized.
Step 7 — Release Management The ReleaseManager agent deploys the fixed adapter via shadow deployment: the new adapter processes live traffic in parallel with the old one, but results from the new version are logged without being returned to users. After 30 minutes, shadow metrics show 100% parse success vs. 0% on the old adapter for new-format responses. The ReleaseManager promotes the fix to production.
Step 8 — Knowledge Management The KnowledgeCurator agent captures the full incident-to-resolution chain as a knowledge article in OpenMemory (Qdrant vector store): “MarketDataAPI schema migration: symptoms, root cause, fix, and prevention.” This article is semantically indexed. Next time a similar failure occurs, the IncidentManager’s first action (searching the Known Error Database) will find this resolution and apply it automatically, reducing resolution time from hours to seconds.
Step 9 — Continual Improvement The ImprovementCoach agent notes that external tool API schema changes have caused 12 incidents this quarter. It creates an improvement initiative: “Implement automated schema validation for all external tool integrations.” This enters the PortfolioManager’s backlog as a new capability investment.
Several properties of this scenario merit explicit discussion.
Autonomous governance in practice. No human was involved in Steps 1–9. The system detected, diagnosed, fixed, deployed, documented, and initiated prevention autonomously. This is the Tier 1 governance specification from Section 5.2 operating as designed: every decision in the lifecycle was operational, bounded, and reversible. The scenario exercised nine ITIL practices in sequence (Monitoring & Event Mgmt, Incident Mgmt, Configuration Mgmt, Problem Mgmt, Change Enablement, Release Mgmt, Knowledge Mgmt, Continual Improvement) and three team orchestration patterns (Round Robin for diagnosis, Hierarchical for CAB review, Sequential for the overall flow).
The governance boundary. The human escalation threshold was never triggered because severity was 2 (not 1) and all CAB reviewers approved. Had severity been assessed as 1 (service down), or had any CAB reviewer rejected the proposed change, the system would have escalated to the human-in-the-loop dashboard—exercising the Advisory-tier governance boundary rather than attempting to override it. Scenarios involving strategic decisions (e.g., a service retirement triggered by chronic underperformance, or a budget reallocation following sustained cost overruns) would engage Advisory-tier practices and human decision authority from the outset.
The recursive property. The knowledge article captured in Step 8 is what closes the recursive loop. The next occurrence of this failure class will be automatically resolved at Step 2—the IncidentManager’s Known Error Database lookup will match, and the system will apply the documented fix without traversing Steps 3–7. Over time, the system accumulates an expanding library of known resolutions, progressively shortening mean-time-to-recovery for recurring failure classes.
Maturity level correspondence. In terms of the maturity model defined in Section 8, this scenario operates at Level 4 (Proactive) transitioning to Level 5 (Autonomic). At Level 3, the Known Error DB resolution in Step 2 would function, but Steps 5–7 (novel root cause analysis, change authorization, and release management) would require human intervention. At Level 2, only Step 1 (monitoring and alerting) would be automated. The worked example thus illustrates the operational capability that the higher maturity levels are designed to enable.
Concurrency. The scenario above traces a single lifecycle instance for clarity of exposition, but the architecture is designed for concurrent operation. In production, multiple lifecycle instances execute simultaneously—independent incidents are detected, diagnosed, and remediated in parallel, as the asynchronous messaging infrastructure (RabbitMQ) and ephemeral execution model enable independent lifecycle processing without blocking. Each lifecycle instance operates on its own incident record, problem record, and change request, sharing only the persistent state layer (CMDB, Known Error Database, Knowledge Base). Contention is managed at the data layer through transactional isolation, not at the process layer—process agents are stateless consumers of queued events, and the system scales horizontally by adding execution capacity rather than serializing lifecycle instances. This is a direct consequence of Design Principles 1–3 (Section 3.1): decoupled control, externalized state, and ephemeral execution together ensure that the sequential presentation in this section does not imply sequential operation in deployment.
The process agents and orchestration patterns described in Sections 4–6 depend on several cross-cutting infrastructure components that do not belong to any single ITIL practice but are shared across many. This section describes the architectural requirements each component addresses and, where relevant, notes HIVE’s reference implementation choices. These choices are illustrative—the architectural requirements are general, and alternative implementations are viable for each.
A self-serving agent system requires a formal registry of its own capabilities—without one, agents cannot discover, compose, or lifecycle-manage services programmatically. In HIVE, every AI capability is registered as a formal Service with: a unique identifier, Input Schema (rigid JSON schema validated before execution), Output Contract (guaranteed result structure), Resource/SLA Profile (Team Pattern, Agent Personas, token budget, target SLA), and lifecycle state (draft, active, paused, deprecated). The CatalogManager process agent maintains this registry, validates definitions, auto-generates entries when new workflow patterns stabilize, and retires underperforming services.
Agent interoperability requires stable, typed contracts that are independent of any particular LLM provider or serialization format. Section 2.4 introduced the contract-first design principle; in HIVE’s reference implementation, these contracts are realized as DSPy Signatures, with adapters handling runtime translation to provider-optimal formats. The key property is that process agents and business agents communicate through stable interfaces even as underlying models change—a requirement for any system that expects to survive model upgrades and provider substitutions.
The Continual Improvement practice (Section 5.3) requires a pluggable optimization pipeline. HIVE’s reference implementation uses two optimizers: GEPA [12] for single-agent prompt optimization (mutating system prompts against programmatic quality metrics, achieving approximately 35x fewer rollouts than GRPO in our internal evaluations) and GRPO for workflow-level optimization (comparing candidate workflow configurations against baselines using trace memory). As noted in Section 4.6, these are the chosen optimizers in the current implementation; the architectural requirement is for any optimizer that can evaluate agent or workflow performance against defined metrics and propose improvements. Recent developments such as Recursive Language Models [11] suggest additional approaches to this optimization loop.
Both optimizers are operated by the ImprovementCoach process agent, with results validated by the ValidationTester and promoted by the ReleaseManager through the shadow deployment pipeline.
Multi-agent systems that execute operational processes with elevated permissions require defense-in-depth security. The architectural requirements are: federated identity with role-based access control, mutual TLS between services, tenant-isolated data stores, and continuous behavioral monitoring of process agents. HIVE’s reference implementation addresses these via OIDC/OAuth2 (Keycloak) with five role levels, mTLS via a Linkerd service mesh with Kubernetes NetworkPolicies, PostgreSQL Row-Level Security for tenant isolation, and Qdrant collection namespacing for vector memories. Process agents operate with elevated internal permissions but are subject to the SecurityAuditor’s continuous monitoring—the security architecture monitors the monitors.
The KnowledgeCurator practice (Section 5.3) requires a persistent semantic store that enables both storage and retrieval of operational knowledge across the agent population. The architectural requirement is for an embedding-indexed store with fast approximate nearest-neighbor search. HIVE’s reference implementation uses OpenMemory with ModernBERT 768-dimensional embeddings indexed in Qdrant (HNSW). The MemoryAugmentedActor class enriches any agent with automatic memory retrieval (pre-processing) and storage (post-processing), enabling both business and process agents to leverage accumulated organizational knowledge.
The governance tiers defined in Section 5.2 describe a static boundary; in practice, organizations will approach that boundary incrementally. We propose a maturity model that tracks how completely an AI system manages itself as a service—using that term deliberately: the progression mirrors ITIL’s own service-centric worldview, where maturity is measured by how fully the service management practices are realized and integrated. We call this the AI-as-a-Service (AIaaS) Maturity Model to distinguish it from models that measure AI capability (e.g., benchmark performance) rather than AI operational self-governance.
Existing maturity frameworks—CMMI for process maturity, ITIL’s own maturity assessments for service management organizations—measure human organizational capability. The AIaaS model measures the degree to which agent systems internalize those same capabilities, reducing dependence on human operational intervention while preserving human strategic authority.
|
Level |
Name |
Self-Management Capability |
|---|---|---|
| Scripted | No self-management. Human operators handle all operational tasks. Most current multi-agent frameworks operate at this level. | |
| Monitored | Monitoring & Event Management is automated: the system detects failures, correlates events, and alerts humans. No autonomous remediation. Some current frameworks that include health-check and retry logic approximate this level. | |
| Reactive | Incident Management, Problem Management, Knowledge Management, and Configuration Management are automated. The system can diagnose failures, apply known fixes from the Known Error Database, and track dependencies. Novel failures still escalate to humans. | |
| Proactive | Capacity & Performance, Service Level Management, Continual Improvement, and Release Management are automated. The system predicts issues before they occur, optimizes preemptively, and manages its own deployment pipeline. Most failures are prevented rather than remediated. | |
| Autonomic | All 25 Tier 1 practices operate autonomously. Advisory-tier practices provide full analytical support with humans retaining decision authority. The system manages routine operations end-to-end; humans govern strategic direction and organizational priorities. |
Each level subsumes the capabilities of the levels below it. The practice-to-level mapping above is indicative, not prescriptive: organizations may activate practices in a different order depending on their operational priorities and risk tolerance. The key assessment criterion at each level is whether the specified practices can execute their full lifecycle—detection through resolution through knowledge capture—without human intervention for the class of decisions they are designed to handle.
Achieving Level 5 represents the practical realization of the Autonomic Computing vision described in Section 2.2, now enabled by LLM-powered process agents executing ITIL 4 as their operational procedure set. The governance tier migration described in Section 5.2—practices moving from Advisory to Autonomous as confidence in agent decision-making grows—is the mechanism by which a system advances through this maturity model over time.
If the major technology companies are correct that the majority of knowledge work will be substantially automated within the next several years, that claim smuggles with it a requirement that is rarely stated: parity with the existing operational landscape. Enterprises do not operate in greenfield conditions. They have roles, responsibilities, service level agreements, compliance obligations, and organizational structures that an incoming agent workforce must integrate with—not replace overnight. History offers no precedent for wholesale operational replacement at this scale and pace; what it does offer, repeatedly, is the pattern of integration: new capabilities absorbed into existing frameworks, extended to accommodate new participants, and hardened through iterative deployment. The question, then, is not whether multi-agent systems need operational governance—they do—but whether that governance should be invented from scratch or drawn from the discipline that has spent three decades codifying precisely how organizations manage complex, interdependent, failure-prone digital services. This section examines the implications of choosing the latter.
A natural objection is that AI systems require novel self-management approaches rather than frameworks designed for human IT operations. We disagree for three reasons:
First, the operational functions are identical. Detecting failures, diagnosing root causes, managing changes, capturing knowledge, planning capacity—these functions do not change because the executor is an agent rather than a human. The agent still needs to classify incident severity, still needs to track configuration dependencies, still needs to version-control changes.
Second, ITIL 4 is implementation-agnostic by design. It specifies what practices must exist and how they interact, but deliberately avoids prescribing who performs them or what tools they use. This abstraction is precisely what enables agent implementation.
Third, ITIL 4 is comprehensive. Ad-hoc approaches to agent self-management invariably reinvent subsets of ITIL—a “health check” is Monitoring & Event Management; an “error log” is a primitive Incident Database; a “model registry” is a partial CMDB. As the analysis of independent work in Section 2.1 illustrates, researchers arriving at multi-agent governance consistently rediscover ITIL-shaped structures without recognizing the lineage—confirming that the operational requirements are not artifacts of the ITSM discipline but inherent properties of the problem domain. By adopting ITIL wholesale, we avoid the gaps and inconsistencies of piecemeal reinvention.
Given that the operational functions are identical, the critical design question becomes where to draw the boundary between agent autonomy and human authority.
The two-tier classification reveals a natural governance boundary that aligns with established IT governance theory. The 25 autonomous practices are operational—they concern how the system runs. The 9 advisory practices are strategic—they concern what the system should become.
This maps directly to COBIT’s separation of governance (evaluate, direct, monitor) from management (plan, build, run, monitor) [7]. In HIVE, process agents handle management; humans handle governance. The agents keep the system running, healing, and learning. The humans decide where the system should go.
This boundary also addresses a predictable reviewer concern: “are you claiming agents do strategy management?” We are not. We are claiming that agents automate the analytical substrate of strategy management—the data gathering, trend identification, impact assessment, and option generation that currently consume the majority of a human strategist’s time. The strategic decision remains human. The difference is that the human now receives “Agent X: 3% utilization, 40% failure rate, zero unique capabilities vs. Agent Y—recommend retirement” rather than spending hours assembling that analysis from raw dashboards.
Critically, this boundary is not fixed. As agent capabilities mature and trust is established through operational track records, advisory practices may shift toward autonomy. A PortfolioManager agent that has successfully surfaced 50 correct retirement recommendations with zero false positives earns progressively more decision authority. The boundary is a dial, not a wall.
A natural follow-up concern is whether the governance apparatus itself is trustworthy—particularly when the agents responsible for detecting and remediating failures are themselves subject to the same failure modes they are meant to address.
A self-governing system faces a recursive question: who manages the managers? If the IncidentManager agent fails, who creates an incident for that failure?
HIVE addresses this pragmatically:
Process agents are monitored by the same EventMonitor that monitors business agents. The EventMonitor is the one process agent with enhanced resilience guarantees (redundant deployment, automatic restart).
Critical process agent failures trigger human escalation to the HITL Dashboard. The system is self-governing for routine operations; it escalates to humans for meta-operational failures.
Process agents are simpler than business agents. The IncidentManager’s task (create a record, classify severity, assign, track) is far more constrained than a DataAnalyzer’s task. Simpler tasks mean fewer failure modes.
The EventMonitor warrants special architectural treatment. Redundant deployment protects against crash failures, but not against systematic logic failures: if the EventMonitor has a blind spot—say, misclassifying a particular event type—every redundant copy shares the same blind spot. This is qualitatively different from hardware redundancy. For this reason, the EventMonitor is the one process agent where formal verification of classification rules, rule-based fallbacks, or ensemble consensus (multiple independent monitors that must agree) may be warranted over purely LLM-based decision-making.
The Change Advisory Board acts as a natural circuit breaker against cascading hallucination. The deeper bootstrapping risk is not that a process agent fails but that it hallucinates: an IncidentManager that fabricates spurious incidents could trigger spurious problem investigations that propose spurious changes. The governance boundary at Change Enablement (Section 5.2) halts this cascade—a hallucinated change proposal must survive review by the SecurityAuditor and ArchitectureGuardian before reaching production. The CAB is specifically designed as a deliberative gate, not a rubber stamp.
These architectural choices also position HIVE relative to the emerging landscape of agent operations tooling. The “LLMOps” and “AgentOps” disciplines provide further evidence for our central thesis: practitioners building operational tooling for AI systems are converging on individual ITIL practices independently, validating the mapping even when the lineage is not recognized:
Arize and LangSmith provide agent health monitoring and tracing—a partial implementation of Monitoring & Event Management.
PromptLayer and Humanloop offer prompt version control—a form of Configuration Management.
Weights & Biases model registries serve as a partial CMDB.
Braintrust enables A/B testing and evaluation for prompts—analogous to Service Validation & Testing.
Platform-specific error dashboards (e.g., LangSmith’s run traces) partially implement Incident Management.
Each of these tools addresses one or two practices in isolation. None provides the integrative layer that connects monitoring to incident management to problem investigation to change enablement to knowledge curation. HIVE’s contribution is precisely this synthesis: adopting ITIL 4’s 34 practices as an integrated, agent-operated system where practices share infrastructure, escalation paths, and accumulated knowledge.
Despite this integrative advantage, several limitations and open questions remain.
This paper presents the architectural specification for 34 process agents. The current HIVE implementation includes Incident Management, Knowledge Management, Configuration Management, and Continual Improvement agents, with others in active development. Full instantiation of all 34 remains ongoing work, and systematic measurement of operational outcomes—MTTR, incident prevention rates, cost savings relative to human-operated baselines—awaits that broader deployment.
Two practical concerns deserve acknowledgment. First, the governance apparatus itself imposes latency and token cost. Running a Virtual CAB with three agent personas to approve a change implies nontrivial overhead compared to a traditional CI/CD gate. Mitigation strategies include lightweight models for routine approvals, pre-authorized fast-paths for known-safe change categories (e.g., documentation updates, non-breaking configuration changes), and batched approvals for low-severity modifications. The appropriate tradeoff between governance thoroughness and operational velocity is an empirical question that will vary by deployment context.
Second, implementing 34 distinct agent personas with specialized infrastructure is a substantial engineering undertaking. If the management layer’s complexity rivals that of the business layer it supports, the framework’s value proposition is undermined. The AIaaS Maturity Model (Section 8) is specifically designed to address this concern: organizations begin with a minimal viable governance layer (Level 2, monitoring and incident response) and add practices incrementally as operational needs justify the investment. The framework is a menu, not a mandate.
Third, multi-model and multi-provider heterogeneity introduces failure modes that are qualitatively different across LLM backends. A process agent powered by one model family may exhibit different hallucination patterns, context window limitations, and reasoning failures than the same agent powered by another. HIVE treats this as analogous to human skill heterogeneity absorbed by standardized processes: the ITIL practices define what must happen (classify severity, query the CMDB, convene a CAB), and the process agent’s underlying model is an implementation detail managed through Supplier Management and Configuration Management. However, systematic characterization of how model-specific failure modes propagate through the governance apparatus—and whether certain process agent roles require model-specific hardening—remains an open empirical question.
Finally, several research directions emerge from this architecture. Whether process agents should themselves be subject to GEPA/GRPO optimization—optimizing the IncidentManager’s triage accuracy or the KnowledgeCurator’s retrieval relevance, for example—opens a question of meta-optimization. The EventMonitor’s privileged position (Section 9.3) warrants particular attention: as the foundational detection layer upon which all other process agents depend, it is the one component where formal verification of classification rules, ensemble consensus across independently trained monitors, or hybrid rule-based/LLM architectures may be justified over purely model-based decision-making—ensuring that the system’s ability to detect its own failures does not itself silently fail. And the governance boundary’s position along the autonomy dial (Section 9.2) is likely to shift as the field develops better methods for calibrating trust in LLM-based decision-making.
This paper has argued that the IT Service Management discipline—specifically ITIL 4’s 34 management practices—provides the architectural blueprint for a multi-agent system’s self-governing operating system, with 25 practices operating autonomously and 9 providing agent-powered analytics under human decision authority.
The key insight is not that multi-agent systems resemble ITSM-governed organizations. It is that they should operate as ITSM-governed organizations, with dedicated process agents executing each ITIL practice: an IncidentManager that creates and triages incidents, a CMDBManager that tracks all tools and dependencies, a KnowledgeCurator that captures and disseminates learnings, a ChangeAdvisory that convenes review boards before production modifications, and 30 more agents spanning strategy, portfolio, capacity, security, finance, and continual improvement.
The result is a recursive, self-serving architecture. The multi-agent system manages its own operations through the same ITSM framework it was built upon. It detects its own failures. It diagnoses its own root causes. It applies its own fixes. It documents its own learnings. It optimizes its own performance. It convenes its own change advisory boards. It manages its own configuration database.
Three decades of ITSM practice have already specified the 34 management processes—25 fully automatable, 9 requiring human strategic judgment—their interactions, their metrics, and their governance structures. Two decades of autonomic computing research have articulated the vision. The emergence of LLM-powered agents provides the missing capability. HIVE connects them into a working architecture.
The multi-agent system self-management problem has a solution specification. Three decades of ITIL practice provide the blueprint; this paper shows how to instantiate it as agent-executable architecture—25 practices operated by agents for agents, 9 where agents inform and humans decide. Whether the specification fully realizes its promise at scale is the work ahead. The contribution is not novel engineering but a recognition: the engineering was done decades ago, in a different context, for a different class of system. The context has now converged.
We invite the multi-agent systems community to evaluate, extend, and challenge this mapping—and to consider whether the operational management discipline, like the agent systems it now governs, has been waiting for this convergence all along.
The revision of this paper from its initial draft to its current form was conducted as a structured collaboration between the human authors and multiple AI systems, principally Claude (Opus 4.6, Anthropic) operating through Anthropic’s Cowork collaborative interface and Claude Code (Anthropic’s agentic coding environment) across several independent sessions. Gemini 3 Pro (Google) and Claude Sonnet 4.5 (Anthropic) were additionally used for grounding verification and citation cross-checking. Perplexity was used for structured critical review, providing detailed feedback on framing, claims calibration, and structural coherence that informed the v1.2 revision. We disclose this not as a caveat but as a methodological choice that we believe warrants the same epistemic transparency we advocate for in the governance of agent systems themselves.
The Cowork sessions were conducted without custom instructions or
specialized prompting. The working environment consisted of a shared
directory containing the draft .tex and typeset PDF,
reference papers, and a working Markdown file that served dual purposes:
as a continuity mechanism across sessions (each new model instance
inheriting the accumulated editorial context) and as a changelog for the
co-author to review independently. The collaboration spanned multiple
independent Opus 4.6 instances; no single instance persisted across the
full revision process. Continuity was maintained through the editorial
record itself and the human author’s stewardship of the revision process
across sessions—a practical instance of the structured handoff governance
this paper advocates.
Claude served as an editorial collaborator across the full revision lifecycle: conducting section-by-section comparative analysis between drafts, identifying structural and argumentative gaps, proposing revisions, and performing the mechanical LaTeX preparation for arXiv submission. The collaboration was dialogic—Claude raised substantive questions (“Does the architecture handle concurrent lifecycle instances?”, “Does the ‘complete specification’ claim survive agent-native failure modes like context window exhaustion?”), and the human authors made the disposition decisions: what to claim, what to defer to future work, what to flag for co-author review, and where to draw the line between architecture and implementation. All such decisions are documented in a 53-note editorial record that traces the reasoning, alternatives considered, and trade-offs evaluated at each decision point.
We describe this process because we believe it matters for the epistemology of AI-assisted scholarship. A paper about the governance of AI agent systems ought to be transparent about its own use of AI agents. The editorial notes exist precisely so that a reader—or reviewer—can distinguish collaborative reasoning from generated text, and can trace any claim in the final paper back to the deliberative process that produced it. Accountability for the paper’s claims, framing, and scope rests with its human authors. The contribution of the AI collaborators was substantive, documented, and governed—which is, in the end, what this paper argues all agent contributions should be.
[1] Wu, Qingyun, Bansal, Gagan, Zhang, Jieyu, Wu, Yiran, Li, Beibin, Zhu, Erkang, Jiang, Li, Zhang, Xiaoyun, Zhang, Shaokun, Liu, Jiale, Awadallah, Ahmed Hassan, White, Ryen W, Burger, Doug, and Wang, Chi. “AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation.” arXiv preprint arXiv:2308.08155, 2023. https://arxiv.org/abs/2308.08155
[2] Chase, H. “LangChain.” GitHub repository, 2022. https://github.com/langchain-ai/langchain
[3] Moura, J. “CrewAI: Framework for orchestrating role-playing, autonomous AI agents.” 2023. https://github.com/crewAIInc/crewAI
[4] OpenAI. “Swarm: Educational framework for lightweight multi-agent orchestration.” 2024. https://github.com/openai/swarm
[5] AXELOS. ITIL Foundation: ITIL 4 Edition. TSO (The Stationery Office), 2019. https://www.axelos.com/certifications/itil-4-foundation
[6] Kephart, J.O. and Chess, D.M. “The Vision of Autonomic Computing.” IEEE Computer, 36(1):41-50, 2003. https://doi.org/10.1109/MC.2003.1160055
[7] ISACA. COBIT 2019 Framework: Introduction and Methodology. ISACA, 2019. https://www.isaca.org/resources/cobit
[8] Netflix Technology Blog. “Unified Data Architecture at Netflix.” 2024. https://netflixtechblog.medium.com/
[9] Khattab, Omar, Singhvi, Arnav, Maheshwari, Paridhi, Zhang, Zhiyuan, Santhanam, Keshav, Vardhamanan, Sri, Haq, Saiful, Sharma, Ashutosh, Joshi, Thomas T., Moazam, Hanna, Miller, Heather, Zaharia, Matei, and Potts, Christopher. “DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines.” arXiv preprint arXiv:2310.03714, 2023. https://arxiv.org/abs/2310.03714
[10] Tomašev, N., Franklin, M., and Osindero, S. “Intelligent AI Delegation.” arXiv preprint arXiv:2602.11865, 2026. https://arxiv.org/abs/2602.11865
[11] Zhang, A.L., Kraska, T., and Khattab, O. “Recursive Language Models.” arXiv preprint arXiv:2512.24601, 2026. https://arxiv.org/abs/2512.24601
[12] Agrawal, Lakshya A., Tan, Shangyin, Soylu, Dilara, Ziems, Noah, Khare, Rishi, Opsahl-Ong, Krista, Singhvi, Arnav, Shandilya, Herumb, Ryan, Michael J., Jiang, Meng, Potts, Christopher, Sen, Koushik, Dimakis, Alexandros G., Stoica, Ion, Klein, Dan, Zaharia, Matei, and Khattab, Omar. “Reflective Prompt Evolution via Genetic-Pareto Optimization (GEPA).” arXiv preprint arXiv:2507.19457, 2025. https://arxiv.org/abs/2507.19457
[13] Abdula, Moe, Averdunk, Ingo, Barcia, Roland, Brown, Kyle, and Emuchay, Ndu. The Cloud Adoption Playbook. Wiley, 2018. ISBN: 9781119491811. https://www.wiley.com/en-us/The+Cloud+Adoption+Playbook:+Proven+Strategies+for+Transforming+Your+Organization+with+the+Cloud-p-9781119491811
[14] Zitzer, Nick and Singer, Eric. “HIVE Framework v1.1.” Internal technical document, 2025.
[15] Zitzer, Nick and Singer, Eric. “HappyHive Technical Design Document v2.0.” Internal technical document, January 2026.
|
# |
ITIL 4 Practice |
Process Agent |
Primary Responsibility |
Tier |
|---|---|---|---|---|
| Strategy Management | StrategyAdvisor | Recommends capability investments from usage patterns | Advisory | |
| Portfolio Management | PortfolioManager | Surfaces underperforming services; recommends retirement | Advisory | |
| Architecture Management | ArchitectureGuardian | Validates compositions; flags constraint violations | Advisory | |
| Service Financial Management | CostController | Enforces budgets (auto); recommends allocations (advisory) | Advisory | |
| Workforce & Talent Mgmt | TalentManager | Tracks agent utilization; surfaces capability gaps | Advisory | |
| Continual Improvement | ImprovementCoach | Runs optimization loops; creates improvement initiatives | Autonomous | |
| Measurement & Reporting | ReportingAnalyst | Generates reports; detects SLA trends | Autonomous | |
| Risk Management | RiskAssessor | Evaluates workflow risk; flags for human review | Autonomous | |
| Information Security | SecurityAuditor | Detects violations and anomalous behavior | Autonomous | |
| Knowledge Management | KnowledgeCurator | Captures learnings; manages RAG context | Autonomous | |
| Org Change Management | ChangeAdvisor | Monitors adoption; recommends communication plans | Advisory | |
| Project Management | ProjectCoordinator | Tracks initiatives; surfaces status to human owners | Advisory | |
| Relationship Management | RelationshipManager | Monitors satisfaction signals; surfaces risk indicators | Advisory | |
| Supplier Management | SupplierMonitor | Monitors provider health; triggers failover | Autonomous | |
| Business Analysis | RequirementsAnalyst | Decomposes requests; maps to existing catalog | Autonomous | |
| Service Catalogue Mgmt | CatalogManager | Maintains service definitions | Autonomous | |
| Service Design | ServiceDesigner | Proposes team compositions; presents options for approval | Advisory | |
| Service Level Management | SLAMonitor | Monitors SLA compliance; creates incidents on breach | Autonomous | |
| Availability Management | AvailabilityManager | Auto-scaling; failover | Autonomous | |
| Capacity & Performance | CapacityPlanner | Predicts demand; identifies bottlenecks | Autonomous | |
| Service Continuity | ContinuityGuardian | Validates recovery; maintains DR runbooks | Autonomous | |
| Monitoring & Event Mgmt | EventMonitor | Correlates events; auto-creates incidents | Autonomous | |
| Service Desk | ServiceDeskRouter | Routes requests to catalog entries | Autonomous | |
| Incident Management | IncidentManager | Creates incidents; classifies; assigns; tracks | Autonomous | |
| Problem Management | ProblemInvestigator | Investigates recurring failures; maintains Known Error DB | Autonomous | |
| Change Enablement | ChangeAdvisory | Convenes virtual CAB; approves/rejects changes | Autonomous | |
| Release Management | ReleaseManager | Orchestrates promotion; rollback on regression | Autonomous | |
| Service Validation | ValidationTester | A/B tests; certifies quality | Autonomous | |
| Configuration Management | CMDBManager | Maintains registry of agents, tools, dependencies | Autonomous | |
| IT Asset Management | AssetTracker | Tracks model versions; license compliance | Autonomous | |
| Service Request Mgmt | FulfillmentAgent | Executes standard requests | Autonomous | |
| Deployment Management | DeploymentAgent | Containerized deployments; health validation | Autonomous | |
| Infrastructure & Platform | InfrastructureMonitor | Database/queue/vector store health and scaling | Autonomous | |
| Software Dev & Management | DeveloperAgent | Code generation, review, and testing | Autonomous |
|
Scenario |
Recommended Pattern |
ITIL 4 Practice |
|---|---|---|
| Standard pipeline with deterministic handoffs | Sequential | Request Fulfillment |
| Multi-perspective collaborative analysis | Round Robin | Problem Management |
| Conditional branching with escalation | State Machine | Incident Management |
| Expert delegation with coordination | Hierarchical | Change Enablement (CAB) |
| Concurrent evaluation or variant generation | Parallel | Service Validation & Testing |
| Iterative refinement until quality threshold | Loop | Continual Improvement |
HIVE was first formalized in v1.1 documentation in mid-2025, with the initial implementation committed on May 27, 2025. The v2.0 specification was completed in January 2026.↩︎