Engineering Drawing · Consulting Works Dept.

The Anatomy of an AI‑First Tech Consultancy

A blueprint for the technology consultancy in 2026 — drawn from the essay of the same name. Click things: the drawing is annotated.

DWG NO. VIBE‑004

SHEETS 3 OF 3

SCALE NOT TO SCALE

DATE 2026‑07

DRAWN BY A. KHAN

REV A

§0

General Notes

Read before construction. The existing structure has two load-bearing assumptions, and both are cracking.

Clients pay for time. Days booked, rates applied, hours billed. But stronger models keep compressing delivery: when a build estimated at ten days takes six, the billable hours fall — the efficiency gain shows up as less revenue, not better margin.
Capacity is a pyramid. A wide base of juniors doing the volume work, funnelling experience upward into a narrow tier of seniors who carry the judgment. LLMs now do the volume work the base existed to absorb — and the training ground that produced the next generation of seniors thins out at the same time.
Construction order matters. This drawing has three sheets: the agents to build, the platform they run on, and the commercial model that captures the value. The choice is not between having agents and not having them. It is between building them deliberately and watching a competitor build them first.

SHEET 1

The Agent Catalogue

37 components across 8 practice areas. Filter by practice; click a component for its specification. Grounded in a Microsoft-focused practice, transferable to any comparable platform.

This is not a complete list. These are the essentials — the must-haves that lead to more specialised agents and, eventually, to the full agent ecosystem the consultancy runs on.

SHEET 2

The Enabling Architecture

Cross-section, foundation upward. An agent catalogue without the infrastructure underneath it is a wish list. Click a layer to open it; the governance rail spans the full section.

L4UX Layerwhere agents meet people

Copilot Studio, Microsoft Teams, web and custom apps. Power Platform connectors surface agents directly in Teams or Power Apps without custom development — the consultant meets the agent inside the tools they already work in.

▲

L3Orchestration Layerruntime · coordination · dev surface

Three responsibilities. The agent runtime handles model binding, tool execution, state, and the conversation loop (Foundry Agent Service on the Microsoft stack). A multi-agent framework handles fan-out/fan-in and multi-step workflow logic (Microsoft Agent Framework here; LangGraph elsewhere). A development surface provides the model catalogue, prompt management, evaluation, and the pipeline to production (Azure AI Foundry). Building agent infrastructure outside a governed development surface is technically possible and practically expensive.

▲

L2Integration & MCP Layerreaching live systems

Agents that reason only over a static corpus are useful. Agents that can also read and write live systems are a different category of useful. MCP is the standard connection layer for external systems — client CRMs, ERPs, partner platforms. Native integrations take priority inside the ecosystem: Microsoft 365, Azure DevOps, Dynamics 365, Power Platform. Azure API Management governs every agent-to-system call: rate limiting, authentication, audit logging.

▲

L1Knowledge Layerthe governed lakehouse

Every past proposal, architecture document, delivery artefact, methodology asset, and lessons-learned record — indexed and queryable. OneLake (or any comparable governed lakehouse) consolidates CRM, timesheets, repositories, DevOps, SharePoint, Teams, and Outlook into one foundation; AI Search / RAG sits above it with vector search, hybrid retrieval, and metadata filtering. The ingestion pipeline — chunking, tagging, indexing — is the unglamorous prerequisite most consultancies underestimate. Without this layer, agents produce generic output that reflects nobody's actual practice.

▲

L0Source Systemsexisting ground

Dynamics 365 (CRM/ERP), Azure DevOps, SharePoint and Teams, Outlook, and external client systems connected through MCP. This is the ground the whole structure is founded on — the raw record of how the consultancy actually works.

Governance · spans all layers

The control plane is not optional when agent output reaches clients or touches sensitive commercial data. Microsoft Agent 365 (or whatever fills the role) provides three non-negotiables at scale: observability — a unified registry, performance analytics, and activity mapping for thirty-odd agents across eight practices; governance — onboarding workflows, lifecycle rules, least-privilege access (an agent that reads commercial proposal data must not be reachable from a competing bid); security — Entra for identity, Defender for threats, Purview for compliance. Human-in-the-loop is configured per agent: the agent produces the output, the consultant is accountable for it.

Setup sequence: the layers are inter-dependent — build them in order. But agents that don't depend on the corpus (quality reviewers, standards checkers, test-case generators) can deploy while the ingestion pipeline is still being built. The corpus-dependent agents won't perform until the lakehouse is indexed and searchable.

SHEET 3

The Delivery Model

The commercial structure, load-tested. A consultancy running every project on T&M was once ideally placed. That same profile is now the biggest structural risk in the business.

Stress Test · Delivery Compression vs. Commercial Model

Take a build estimated at 10 days under the old model. Drag the slider to compress delivery with AI, and watch what happens to the same engagement priced two different ways.

AI delivery compression 40%

Time & materials — you bill the days

revenue

Fixed price / outcome — you price the result

delivery cost

margin

Fixed-price was long dismissed as too risky when delivery effort was the primary unknown. AI-augmented delivery changes that calculus, because the consultancy now controls a far larger share of the delivery variables. The portfolio has three lanes:

LANE 01

Fixed Price

For defined, repeatable implementations. A differentiated offer rather than a risk transfer — priced from a leverage ratio you have measured on your own delivery, not one a vendor quoted.

LANE 02

Outcome-Based

For engagements where value is measurable at go-live or adoption milestones. Fixed component tied to go-live criteria, value share tied to adoption at ninety days, managed service for ongoing operation.

LANE 03

T&M, Retained

For advisory and genuinely exploratory work where scope cannot support a fixed commitment. The failure case is not T&M itself — it is the book that is one hundred per cent T&M.

The winning unit is T-shaped and built by dogfooding. A two-person team, augmented by agents and operating across the full stack, will out-deliver a five-person team of vertical specialists whose work does not connect. The narrow specialist loses to the AI-augmented generalist; the shallow generalist loses to the AI itself. And the capability grows only internally: a consultancy that has not run AI across its own proposals, designs, and delivery cannot credibly sell AI-augmented delivery, because it does not yet know what it is selling.

DETAIL A

The Premium: Work Agents Cannot Do

If volume work and artefacts are commoditised, the premium comes from three places agents cannot reach.

Vertical Expertise

A decade of Dynamics 365 in manufacturing means knowing production planning constraints, shop-floor integration patterns, and which configuration decisions cause problems at go-live. None of it sits in public documentation, so no general-purpose model can reproduce it. It lives in implementation artefacts, lessons-learned records, and the judgment calls made across dozens of engagements — measurable delivery compression, chargeable at a premium.

Judgment Under Ambiguity

Agents optimise within a defined solution space; the premium work is defining the space — when requirements contradict each other, constraints are unstated, and the right architecture depends on organisational factors in no brief. The agent surfaces the options. The architect makes the call. Reading the politics between a CFO and a technology sponsor takes years to develop and cannot be derived from a document corpus.

Accountability

The agent produces the output. The consultant signs it. When the architecture recommendation is wrong, the architect is accountable. A consultancy that ships AI-generated output without clear human accountability is not reducing risk — it is transferring it to the client without their knowledge. Clients who understand AI-augmented delivery will pay for the human judgment layer explicitly.

Apply AI to the consultancy before selling it to the client. That is the whole argument, in three moves: build the agent infrastructure and the IP corpus that make fixed-price and outcome-based engagements credible; reprice the go-to-market from a negotiation over days to a conversation about outcomes; and let each practice build AI capability inside its own delivery, with a single AI lead to cross-pollinate — a lead airdropped to sprinkle optimisation across the firm fails consistently.

The reason to move now is the corpus, because it compounds. A consultancy that begins indexing its proposals, designs, and delivery artefacts today holds an advantage within twelve months that a late mover cannot buy back. T&M-only was the safe choice for two decades. It is now the exposure. There is no shortcut, and the window closes.

Approved for Construction