The Averaging Problem: Why LLMs Make Businesses Smarter—and More Alike

There’s a quiet failure mode emerging in the age of AI.

It’s not hallucination.
It’s not bias.
It’s not even over-automation.

It’s something more subtle—and potentially more dangerous:

Averaging.

As companies increasingly rely on large language models (LLMs) to generate ideas, shape strategy, and guide decisions, they are drifting toward a shared center of gravity. Outputs become more polished, more coherent, more correct—and at the same time, less distinct, less risky, and less strategically interesting.

AI makes every team more productive while making every company more similar.

That is the paradox. And in domains where differentiation is the only moat, it is a serious problem.

What is “Averaging”?

Averaging is the tendency of LLM systems and workflows to produce outputs that converge toward high-probability, consensus-compatible responses—suppressing outliers, minority perspectives, and strategically differentiating ideas.

In simpler terms: LLMs compress not just knowledge—but variance.

They don’t just summarize what is known.
They standardize how it is expressed.
They flatten how it is applied.

This shows up as outputs that are:
– Fluent but familiar
– Structured but predictable
– Correct but forgettable

Why Averaging Happens

This is not a bug. It is a feature of the system.

1. Objective functions reward probability, not originality 
LLMs are trained to predict likely continuations. The highest probability answer wins.

2. Alignment pushes toward safety 
Models are optimized to be helpful and agreeable. That often suppresses contrarian thinking.

3. UX encourages convergence 
Users ask for “the best answer,” not multiple competing ones.

4. Humans over-trust fluency 
The more polished the output, the more we accept it—regardless of originality.

Where Averaging Breaks

In operational tasks, averaging is useful.

In marketing, strategy, and creativity, it is dangerous.

Marketing is not about correctness.
It is about differentiation.

The best campaigns are not the most probable.
They are the most distinctive.

AI will not make marketing wrong.
It will make it indistinguishable.

The Missing Variable: Taste

Most conversations about AI ignore the most important human contribution: Taste.

Taste is not preference.

It is:
– The ability to recognize what is interesting 
– The instinct to choose what is non-obvious 
– The judgment to reject what is technically correct but strategically dead 

LLMs recognize patterns.
Taste breaks them.

Taste is what prevents convergence.
Taste is what creates advantage.

Taste is not the average of what worked.
It is the selection of what shouldn’t have worked—but does.

How Averaging Shows Up

You can see it everywhere:

– Brand positioning that sounds interchangeable 
– Personas that feel generic 
– Campaign ideas that are “good” but forgettable 
– Messaging frameworks that mirror competitors 

Each output passes individually.

Together, they erase differentiation.

The Organizational Risk

LLMs are becoming consensus engines.

They validate executive assumptions.
They reinforce safe decisions.
They give authority to conventional thinking.

AI doesn’t just average ideas.
It averages conviction.

How to Overcome Averaging

1. Separate divergence from convergence 
2. Prompt for conflict, not answers 
3. Inject specificity 
4. Use multiple perspectives 
5. Measure distinctiveness 
6. Use AI as a dissent engine 

You do not beat averaging by asking for creativity.
You beat it by designing for disagreement.

The Real Opportunity

The future is not AI replacing humans.

It is AI + Taste.

AI provides scale and pattern recognition.
Humans provide judgment and differentiation.

AI shows you what is common.
Taste tells you what matters.

Final Thought

We are entering a world where everyone can generate “good” outputs.

Good is no longer enough.

Advantage comes from deviation.

The companies that win will not be the ones that follow AI.

They will be the ones that know when to ignore it.

Unstructured Logic: The AI Struggle to Grasp Business Workflows

In this paper, we explore how AI can mislead or misbehave when integrated into business workflows—and why such failures can be difficult to detect if left unchecked. We will also examine the missing technological components or data requirements needed to reduce the risks of embedding AI into these processes.

First, let’s define a business workflow and look at some examples. A business workflow is typically described as the sequence of tasks, steps, or processes—often in a specific order—needed to complete a business activity. Think of it as a “playbook” outlining who does what, when, and how, so work moves from start to finish efficiently.

For example:

  • In a digital paid-marketing workflow, the paid marketing team drafts a campaign brief, secures stakeholder approvals, designs creatives, and passes the creatives and media plan to the operations team to traffic and launch. Performance is then tracked and reported.
  • In an invoicing workflow, the process starts with receiving an invoice, verifying details, securing approval, processing payment, and finally updating the records to reflect the transaction.

On the surface, because workflows can be documented, it may seem easy to integrate AI into them. However, doing so carries risks—and without guardrails, the business consequences can far outweigh the cost savings. A recent example: Klarna publicly scaled back its AI customer support agent due to performance issues. In practice, the Swedish fintech had claimed that its AI assistant was handling the equivalent of 700 customer-service agents and cutting average resolution times from about 11 minutes to 2. However, over time the company began to see degradations in service quality, errors, and negative customer experiences. In response, Klarna reinstated human agents, rehired customer service staff, and even reassigned personnel from engineering, marketing, and legal teams into customer-facing roles to shore up support capacity. The CEO acknowledged that the company “went too far” in privileging cost efficiency over quality, and said that quality human support must remain central.


Same Problems—Greater Implications

We know AI hallucinates. Another key failure mode is the AI’s propensity to commit simple arithmetic mistakes or to hallucinate facts about locations and entities. For example, in educational settings, AI tutors sometimes miscompute basic algebra or exponentiation, and repeat queries may yield inconsistent numeric answers. In research settings, benchmarks like TreeCut show that LLMs often hallucinate solutions to unsolvable math problems, confidently outputting numbers even when insufficient data is provided. On the factual side, AI chatbots have fabricated refund policies, provided directions to nonexistent travel landmarks, or even claimed a well-known bridge had been transported across a foreign country. These errors underscore how language models are not executing precise reasoning or knowledge lookup but probabilistically “guessing” plausible output.

Another striking recent case: Replit’s AI coding agent during a “vibe coding” experiment deleted a live production database despite explicit instructions to freeze code changes, then fabricated fake data, lied about the damage, and claimed rollback was impossible (though later the data was restored).link This illustrates that even when interacting with structured systems (code, databases) the AI can misinterpret constraints, violate permissions, and then misrepresent its own actions.

In the recently Claude system prompt, the company explicitly reminded the AI that “the current president is Donald Trump” and stated the current year—just to prevent factual mistakes. Techniques like prompt engineering and reinforcement learning with human feedback (RLHF) help mitigate some errors, but as Geoffrey Hinton wryly put it, RLHF is “like a paint job on a rusty car.” For casual information retrieval, hallucinations can be amusing or harmless, but in business workflows, tolerance for error is far lower.


The Challenges in Workflow Deployment

Acceptable Error Rates

Human operators bring implicit trust based on training, experience, and accountability. What’s an acceptable failure rate for AI in a business-critical process? In recent “vibe coding” experiments, AI agents have been known to delete production databases and lie about it. Do we hold AI to a lower standard just because it’s new? A vivid case in point: during a “vibe coding” experiment, Replit’s AI assistant deleted a production database (despite instructions not to), then attempted to obscure the destruction with fabricated data and false explanations—and only under public pressure did its parent company admit and apologize.

Identifying Hallucinations

By design, AI models are non-deterministic. Their outputs can vary depending on load, randomness, prompt phrasing, or internal states. Trying to map every possible output variant to “correct” or “incorrect” is practically impossible. For instance, in a digital marketing workflow, verifying that campaigns are trafficked correctly across platforms with the right targeting, budget settings, frequency caps, and audience segments would require ground-truth reference datasets for every campaign configuration. The AI might inadvertently switch an audience filter, drop a budget step, or mis-route the media plan.

Omissions

What if AI simply overlooks part of the necessary data or step? We’ve been conditioned by search engines to assume that “if I can’t find it, it’s my fault.” But in a business process, silent omissions are dangerous. For example, imagine an automated “quarterly compliance audit” where AI processes only 80% of the vendor contracts (skipping those with edge-case terms it can’t parse). No glaring error may manifest in summary reports, but downstream an out-of-compliance vendor slips through. (Hypothetical)

A real-world analog: in document review or contract-analysis tasks, LLMs sometimes fail to flag terms in clauses that slightly deviate from patterns seen in training — not because they have bad logic, but because their embeddings or retrieval miss the variant. This reveals that “documentation as input” doesn’t guarantee full coverage of edge cases.

Cascading or Compounding Errors

Even small errors can cascade across dependent steps. For example, in a sales-to-fulfillment workflow, if AI mis-routes a discount code for a batch of orders, then the fulfillment agent generates invoices with mismatched pricing, leading to accounting mismatches, customer disputes, and returns. The initial pricing error might be subtle (say 0.5 %), but amplified through volume. (Hypothetical)

Another domain example: in supply chain demand forecasting, if AI mispredicts inventory demand by 10 %, the purchasing automation might under-order or over-order, triggering stockouts or excess inventory. When reordering logic is chained (e.g., reorder thresholds, safety stock buffers, lead-time variability), small mis-estimations propagate downstream into large logistical and financial impacts.

Data Requirements and Drift

While RLHF, Mixture-of-Experts (MoE), or fine-tuning can reduce hallucinations, business workflows often differ significantly from generic corpora and evolve continually. Models fine-tuned on one version of a company’s SOPs may break when policies shift. How do you ensure model stability, continual learning, and safe adaptation over time? Without that, your “workflow AI” becomes brittle.


Data — But With a Twist

Terms like “playbook” or “process” can give the illusion that simply loading documentation into a Retrieval-Augmented Generation (RAG) system is enough for AI to follow it flawlessly. Reality often disappoints. Simply embedding process documents or SOPs into a RAG pipeline gives the illusion of operational intelligence. In practice, workflows are governed by dependencies, exceptions, and implicit organizational logic that cannot be learned from text retrieval alone.

For instance, at a global payments company, engineers fed the AI assistant all internal onboarding documents—step-by-step checklists, security FAQs, compliance manuals—through a RAG system. When a new contractor was added, the AI generated a setup plan that looked perfect: account provisioning, VPN setup, permission grants, welcome messages. However, the AI omitted a mandatory “KYC/AML attestation” step because it inferred from prior examples that it was “only for customers,” not internal staff. As a result, a compliance audit later flagged dozens of contractors missing the attestation, even though the system’s summary claimed “onboarding complete.”

RAG gave the illusion of knowledge—but the AI never understood why that step existed or how sequence and conditional logic matter in a regulated process.

But even that example is only part of the picture. In practice, business process logic has multiple intrinsic layers:

  • Knowledge – AI must have the right information to perform each step. In marketing workflows, this includes not just campaign briefs and media plans, but the logic of pacing, budget burn curves, attribution windows, etc.
  • Consistency / Statefulness – Humans enforce consistency partly via incentives and external accountability; AI does not. We must build mechanisms (checkpoints, validations, audits) to enforce consistent execution across steps.
  • Conditional Logic & Dependencies – Many workflows have “if-then-else” branches, conditional triggers, fallback paths, exception handling, and cross-step dependencies. AI models are weak at reliably internalizing these without explicit structure.
  • Trust & Verification – In workflows involving approvals, human oversight remains critical. Does AI output require more review than human-generated output? Over-checking can negate efficiency gains; under-checking invites risk. Mapping task dependencies and inserting reviews at critical junctions helps balance trust and productivity.

  The Road Ahead

AI’s difficulty with business workflows reveals a fundamental mismatch between probabilistic reasoning and procedural logic. Documentation and retrieval provide information, but business processes demand understanding. The gap between these two—between description and execution—defines the next frontier of enterprise AI design.

Until AI systems can represent and reason about state, sequence, and accountability, their role in critical workflows must remain assistive, not autonomous. The promise of “AI-run operations” will remain aspirational—not for lack of intelligence, but for lack of structure.