Presentation narrative

ClickUp AI Control Opportunity

A strategy for building the infrastructure for AI agent management: cost control, risk control, agent memory, and AI-ready data.

Not just productivity. A multimillion-dollar revenue and savings opportunity.

Save enterprises millions by controlling AI waste and risk, then turn that control layer into premium enterprise revenue for ClickUp.

Enterprises are adopting AI agents faster than they are building the infrastructure to manage them.

The AI adoption gap is becoming an agent-management gap.

Every company is under pressure to adopt AI fast. The problem is not model access. The problem is the missing infrastructure for AI agent management once agents start doing real work.

40%+of agentic AI projects are forecast to be canceled by end of 2027 due to escalating costs, unclear value, or inadequate risk controls.Source: Gartner press release, June 2025.
95%of enterprise GenAI pilots were reported as failing to deliver measurable business value.Source: MIT/NANDA “GenAI Divide” coverage and report, 2025.
85%of organizations misestimate AI costs by more than 10%; nearly a quarter are off by 50% or more.Source: CIO coverage of AI cost-overrun survey.
89%expected increase in average compute cost as GenAI workloads scale from 2023 to 2025.Source: IBM, “CEO’s guide to generative AI: Cost of compute.”

The opportunity is agent management infrastructure.

Everyone is releasing agents and agent builders. The revenue opportunity is the layer that manages them: budgets, approvals, context, data definitions, access, alerts, and replay.

ClickUp already owns the work surface where these agents will operate.

AI creates new hidden costs.

The expensive part is not one model call. It is what happens when thousands of agent runs become invisible, ungoverned, repetitive, and grounded in messy business data.

Where spend leaks

Runaway usage, expensive workflows nobody monitors, wrong outputs, risky customer responses, repeated context gathering, messy data, and no visibility into what AI actually did.

Runaway AI usage

Agents loop, retry, call tools repeatedly, and return HTTP 200s while burning budget.

Expensive workflows nobody monitors

Teams cannot attribute cost by agent, workflow, customer, team, model, or tool call.

Risky customer-facing responses

AI can make promises, policy claims, refund guidance, or regulated statements the company owns.

Repeated context gathering

Agents keep paying to rediscover project history, customer preferences, decisions, and procedures.

Messy data causing bad results

Ambiguous business definitions make AI confidently answer with the wrong metric, join, filter, or denominator.

No visibility into what AI actually did

No replay, no audit trail, no failure classification, no savings estimate, and no ROI evidence.

The market is already showing real failure stories.

These are the concrete stories the harnesses are designed around: runaway usage, actions that create liability, and data questions that look correct but are semantically wrong.

40K/min Token usage spikes during loops

Developer reports described token usage jumping from roughly 200/min to 40,000/min when agents looped. The app can look technically healthy while budget burns in the background.

Source: r/LangChain-style developer discussions captured in Blackbox PRD research.
$47K Agent loop ran for 11 days

One documented LangChain-style loop reportedly ran for 11 days and generated a $47,000 bill. That is the nightmare Blackbox is aimed at: cost anomaly, loop detection, and hard budget stops.

Source: Zylos-style AI cost research cited in Blackbox PRD research.
$812 Air Canada chatbot liability

The bot told Jake Moffatt he could apply retroactively for a bereavement fare. The actual policy said no. Air Canada paid $812.02, but the larger cost was precedent and press.

Source: Moffatt v. Air Canada; ABA/BBC coverage.
$50K+ Small AI runs can cause large downstream damage

Incident research included examples where low-cost runs caused outsized damage: deleted data, unauthorized purchases, and pipeline failures. The run cost is tiny; the operational blast radius is not.

Source: Cycles-style AI agent incident research cited in Blackbox PRD.
$1.5M Cheap conversations become enterprise spend

What looks like $0.14 per conversation can become roughly $1.5M annually across thousands of employees when context, retries, and model choice are unmanaged.

Source: enterprise LLM TCO framework cited in Blackbox PRD research.
Wrong Data answers can execute cleanly and still be wrong

SQL can run successfully while using the wrong definition: active customers via login flags instead of subscription status, churn including trial users, or revenue including canceled orders.

Sources: Data Dragon semantic-contract examples; Databricks/Atlan/Cube AI-readiness positioning.

The opportunity: AI agent management inside the work surface.

ClickUp already owns the work surface: tasks, docs, projects, workflows, comments, automations, goals, and team context. If AI work happens inside ClickUp, ClickUp can become the infrastructure where agent work is monitored, governed, learned from, and optimized.

Work surface

Tasks, docs, projects, workflows, comments, goals, automations, permissions, and team context.

Agent activity

Model calls, tool calls, decisions, customer responses, data questions, workflow actions, and escalations.

Enterprise pressure

Adopt AI quickly, but prove cost control, risk control, and ROI.

Infrastructure for AI Agent Management

Not more models. Not another demo agent. A premium control layer that makes agent work measurable, governable, adaptive, and data-ready.

Monitor what AI did and what it cost
Govern what AI is allowed to say or do
Help agents learn and evolve over time
Validate data before AI acts on it
Customer outcome

AI work is safer, cheaper, more explainable, and easier to scale.

ClickUp outcome

Enterprise AI management add-on, governance package, readiness product, and retention moat.

Strategic position

ClickUp owns the control infrastructure around the place where work already happens.

Where the industry is moving.

The market is already moving away from “just build agents” toward context, governance, security, identity, access, and operational control.

Context Databricks / Bloomberg

“We already have artificial general intelligence. We don't need AI to get smarter. It is just lacking context.”

Source: Databricks Community post quoting Ali Ghodsi’s Bloomberg interview, June 2026.
Control Microsoft agent governance

“Every agent must be observable, governed, and secure.” Microsoft also says leaders must “identify what agents exist,” “observe what they do,” and “stop what they should not do.”

Source: Microsoft Learn, Governance and security for AI agents across the organization.
Trust Salesforce Agentforce

“Deploying trusted AI agents requires more than just powerful models.” Salesforce also says agents are “only as reliable and trustworthy as the data they access and act upon.”

Source: Salesforce, Enterprise AI Agent Era: Trust, Security, and Governance.
48% Data foundation concern

Nearly half of IT leaders worry their data foundation is not ready for AI, and 55% lack confidence implementing AI with appropriate guardrails.

Source: Salesforce research cited in Agentforce trust/governance article.
45% AI-generated code security risk

Cloud Security Alliance cites Veracode research finding AI-generated code introduces security vulnerabilities in 45% of development tasks.

Source: Cloud Security Alliance, Vibe Coding Security Crisis, 2026.
2.74x Agentic builder debt

CSA cites research that AI-authored pull requests generated 2.74x more security issues than human-authored pull requests.

Source: Cloud Security Alliance, citing CodeRabbit AI vs human code report.

The four harnesses that become the management layer.

Blackbox

Observability and FinOps for AI work. Captures traces, model calls, tool calls, retries, cost, latency, failures, and guardrail trips.

MonitorOpen flowchart

Govo

Runtime policy gate. Verifies generated answers and risky actions against source-of-truth policies before delivery.

GovernOpen flowchart

Memory OS

Self-learning agent memory. Turns experience into typed memories, salience, utility, lifecycle, procedures, and action gates.

EvolveOpen flowchart

Data Dragon

AI data-readiness harness. Builds semantic contracts, lineage contracts, access policies, and SQL validation before agents answer.

OptimizeOpen flowchart

Blackbox: AI operations and FinOps

Records AI runs, prices each step, classifies prompt/model/tool behavior, detects loops and anomalies, warns teams, and can stop or throttle processes before spend runs away.

Concrete example

A run shows repeated tool calls, token spikes, model overkill, failed retries, and cost drift; Blackbox can alert, cap the run, recommend a cheaper model, or stop the workflow.

Revenue angle

Enterprise AI control center for cost attribution, model savings, workflow replay, guardrail compliance, and AI ROI reporting.

Open Blackbox flowchart

Govo: policy verification and action control

Intercepts AI answers/actions, retrieves approved source-of-truth policy, checks contradictions and unsafe claims, then allows, corrects, blocks, or escalates with an audit ledger.

Concrete example

A travel/support/health bot can be forced to verify refund, coverage, or eligibility claims before customer delivery instead of trusting freeform generation.

Revenue angle

Compliance, support-risk, customer-trust, and regulated-workflow package for enterprise AI teams.

Open Govo flowchart

Memory OS: reusable team and customer context

Builds agent learning loops: typed memories, temporal validity, salience, utility reinforcement, lifecycle states, procedural memory, action gates, and audit history.

Concrete example

After a mistake, an agent stores the lesson as a typed mistake/procedure. Future actions retrieve that lesson, raise a warning, or block the same failure from repeating.

Revenue angle

Long-term moat: ClickUp becomes the place where agents evolve with team, project, customer, and workflow experience.

Open Memory OS flowchart

Data Dragon: AI-readiness and semantic validation

Profiles business data, detects ambiguity, builds semantic contracts, generates lineage contracts, applies governance/access policy, and validates generated SQL before the AI answer is trusted.

Concrete example

Raw AI may answer “active customers” using auth flags; gated AI uses the approved subscription-status definition and can be checked against gold SQL.

Revenue angle

AI readiness product for enterprise onboarding, analytics trust, data-governance workflows, and executive reporting accuracy.

Open Data Dragon flowchart

What “harness” means here.

The harness is the control procedure around an AI workflow. The important thing is not the integration mechanism; it is the repeatable set of checks that makes agent work manageable.

Blackbox harness

  1. Wrap the agent run.
  2. Record model calls, tool calls, prompts, costs, latency, retries, and outputs.
  3. Classify prompt/model/tool behavior and detect loops, context overflow, retries, and cost anomalies.
  4. Warn, cap, throttle, downgrade, or stop the workflow before overspend continues.

Govo harness

  1. Read policy/source documents.
  2. Extract enforceable rules and detect contradictions.
  3. Intercept AI response or action before delivery.
  4. Allow, correct, block, or escalate with audit evidence.

Memory OS harness

  1. Store typed memories: decision, procedure, mistake, preference, fact.
  2. Score by scope, salience, utility, lifecycle, recency, confidence, and repeated usefulness.
  3. Consolidate useful experience and suppress stale or superseded memories.
  4. Warn or block when learned memory reveals a repeated or risky action.

Data Dragon harness

  1. Profile schema and detect semantic ambiguity.
  2. Draft semantic contracts and route open questions to humans.
  3. Generate lineage contracts from source columns to approved concepts.
  4. Apply access policy: allow, mask, or deny based on role and requested concept.

Why Blackbox is different from AI observability tools.

Braintrust, Arize, LangSmith, and Langfuse prove there is demand for tracing, evaluation, and monitoring. Blackbox is different because it treats every AI workflow like an operational incident surface: cost, failure mode, replay, and prevention.

Market tools

What they do well

Trace LLM calls, evaluate outputs, monitor quality, compare prompts/models, and help engineering teams improve AI apps.

What remains open

They rarely turn the trace into an operational diagnosis: duplicate tool loop, runaway retry, context overflow, model overkill, budget breach, or fix recommendation.

Sources: Braintrust, Arize Phoenix, LangSmith, Langfuse product positioning.

Blackbox

Harness process

Capture every step → price each call → classify prompt/model/tool behavior → detect loop/anomaly → alert → cap/throttle/downgrade/stop → replay the path.

Specific problems solved

“Which agent caused the bill?” “Which prompt used the wrong model?” “Which tool looped?” “Where should we warn, cap, downgrade, throttle, or kill?”

ClickUp angle: AI admins can see which workspace workflow, agent, tool, or team is burning spend and why.

Why Govo is different from “just RAG.”

Air Canada already had the policy. The chatbot likely had access to it. It still said the opposite. That is the difference between retrieval and enforcement.

Vanilla RAG / knowledge retrieval

What it does

Finds relevant documents and gives them to the model as context.

What it does not guarantee

RAG does not independently verify the final answer. The model can ignore the retrieved policy, blend it with prior knowledge, misquote it, or invent an authorized-sounding action.

Source: Air Canada case coverage; common RAG architecture pattern.

Govo

Harness process

Map policy sources → extract rules → classify action risk → intercept output/tool intent → compare against source → choose allow, correct, block, or escalate.

Specific problems solved

Unauthorized refunds, wrong policy promises, external emails with sensitive claims, workflow actions beyond authority, and “AI had the document but still got it wrong.”

ClickUp angle: AI actions inside workflows can be checked against approved business rules before they touch customers, records, or automations.

Why Memory OS is different from Mem0 and vanilla RAG.

Mem0 is a strong open-source memory layer and proves the category matters. Memory OS focuses on a different enterprise problem: agents need to evolve from experience, not just remember more context.

90%token-cost reduction is the kind of savings memory-layer vendors are already highlighting when agents avoid redundant context.Source: Valkey/Mem0 memory-layer cost framing.
50-80%cost reduction is claimed for semantic caching/context-overflow mitigation patterns in agent systems.Source: Redis context-window overflow guidance.

Mem0 / vanilla RAG

What they do well

Mem0 gives agents persistent memory, personalization, and lower redundant context. Vanilla RAG retrieves semantically similar chunks from a corpus.

What remains open

Remembering is not the same as learning. Agents still need memory lifecycle, utility reinforcement, mistake recall, temporal truth, and action gates.

Sources: Mem0 site/GitHub positioning; Valkey/Mem0 memory cost framing; AI Memory OS scoring architecture.

Memory OS

Harness process

Type the memory → assign scope → score salience/utility/confidence/recency → reinforce useful memories → decay stale ones → gate risky actions.

Specific problems solved

Agents repeating mistakes, forgetting lessons, overloading context windows, retrieving stale truth, violating procedures, and failing to improve over time.

ClickUp angle: agents can evolve with workspace history: decisions, mistakes, preferences, procedures, and customer/project lessons.

Why Data Dragon matters: data is not automatically AI-ready.

Companies are trying to own the semantic layer for BI and agents: dbt, Cube, Atlan, Databricks, Snowflake, and others. Data Dragon goes further by combining semantic contracts, lineage contracts, governance/access policy, and SQL validation.

What the market is circling

Semantic layers: dbt MetricFlow, Cube, AtScale, Atlan, and Strategy/Cube-style tools aim to centralize governed metrics.
Lakehouse platforms: Databricks and Snowflake are pushing governed data + AI access as enterprises scale copilots and agents.
The gap: definitions alone do not tell an agent which ambiguity exists, which source columns prove the metric, who can access it, or whether generated SQL matches approved business logic.
Sources: Atlan semantic-layer comparison, Cube semantic layer for AI/BI, Databricks AI readiness writing.

Data Dragon

Harness process: Profile schema → draft semantic contract → generate column lineage → run governance/access checks → validate SQL before answer.
Specific problems solved: active-customer ambiguity, revenue metric forks, fan-out joins, trial-user denominator errors, date-basis mismatch, masked/unmasked access, and denied concepts.
ClickUp angle: workspace data becomes agent-ready: definitions, lineage, read/masked/denied access, and SQL validation attach to AI work instead of living only in analytics tooling.

Examples that make the harnesses concrete.

Data Dragon: raw AI versus gated AI

Raw AI sees only schema and a question, then may choose the wrong business definition. Gated AI receives the approved semantic contract, runs against gold SQL, and validates the generated SQL before the answer reaches the user.

Govo: allow, correct, block, escalate

The runtime checks the answer against policy docs and source snippets. If the answer contradicts policy, it can correct the answer. If the action is unsafe, it blocks or escalates with evidence in the ledger.

Memory OS: procedural action gate

An agent preparing an action calls memory_prepare_action. The response can surface prior lessons, standing procedures, or approval gates such as “do not deploy on Fridays” or “legal review required before changing refund wording.”

Blackbox: workflow replay

Every AI run becomes replayable: model prompt, response preview, tool input, tool output, cost, latency, retry, error classification, and fix recommendation are captured in one timeline.

How the example flow looks

1. User asks AI to complete work

Request enters with project context, customer context, data question, or policy-sensitive answer.

2. Memory and Data Dragon supply trusted context

Reusable memory and approved semantic contracts reduce prompt bloat and wrong definitions.

3. Govo verifies the result

Risky answer or action is checked against policy before it reaches the customer or workflow.

4. Blackbox proves ROI

Cost, failure, retry, model choice, and guardrail outcome are recorded for AI admins.

Business value: lower internal waste, higher external revenue.

Internally

Reduce cost from failed workflows, manual review, rework, repeated context gathering, and messy data.

  • Catch runaway usage before it becomes a surprise bill.
  • Reduce retries and manual debugging with replayable traces.
  • Use memory and semantic contracts to shrink repeated context spend.
  • Prevent policy mistakes before support, legal, or brand teams pay for them.
Operational thesis: every avoided retry, manual review, bad answer, and repeated context load becomes measurable savings.

Externally

Create a premium enterprise AI layer customers pay for because it helps them prove ROI, reduce risk, and scale AI adoption.

  • AI admin/control center for enterprise accounts.
  • Governance and compliance package for regulated teams.
  • AI readiness/onboarding product for messy workspaces.
  • Long-term retention through memory and workflow intelligence.
Market signal: enterprises need a control layer before agentic AI can scale safely.

What the V0s demonstrate.

Each harness already has more than a storyboard. The prototypes demonstrate repeatable control procedures that can be translated into ClickUp-native modules.

Blackbox

Captures the full agent run, prices each step, classifies loops/retries/tool misuse, exposes budget anomalies, and makes the workflow replayable.

Govo

Reads source policy, extracts enforceable rules, checks generated answers/actions against those rules, and creates allow/correct/block/escalate decisions.

Memory OS

Turns memory into a governed selection process: type, scope, score, lifecycle, recall, and action gate instead of dumping more context into prompts.

Data Dragon

Finds semantic ambiguity, turns approved definitions into contracts, maps lineage, applies access policy, and prevents agents from guessing business meaning.

How ClickUp could package the four harnesses.

The commercial story is not four separate tools. It is one AI control center with four enterprise modules that attach directly to AI work.

AI Spend Control

Blackbox becomes the control surface for AI budgets, cost attribution, high-cost workflows, retries, model choice, and ROI evidence.

AI Risk Control

Govo becomes the enterprise policy layer for customer-facing answers, risky automations, regulated actions, and audit trails.

AI Context Control

Memory OS becomes the durable context layer for teams, customers, projects, procedures, and lessons learned.

AI Readiness Control

Data Dragon becomes the data-quality and semantic-contract layer that makes workspace data usable by agents.