Human-AI Collaboration – HumanLeverage AI

Collaboration and Confidence: The Human Elements AI Can’t Replace

Summary

This section intends to address three specific aspects of the Human contribution to society and the environment in the future, in collaboration with Artificial Intelligence.

The three dimensions we’ll explore are: Knowledge, Reliability and Trust.

Introduction

Why is this relevant right now? AI is changing how we learn, how we prepare young people for a career, and how those in mid-career adapt and leverage the technology that is available. Employers, companies doing business, are already benefiting from AI and the efficiency that can achieved, at time threatening portions or all of some jobs.

Knowing what the Human Leverage is, how this defines our contribution and developing the skills identified, will increase our chances to grow and evolve at the same pace technology is becoming available.

In leading AI technology development companies, more than half of the requirements for open jobs are unique to a human. IF we assume that leading AI companies utilize all its current capability, we can conclude that the traits required in employees to perform these jobs are the human leverage a person brings to this context.

But delineating with precision the Human Leverage and the AI Leverage, more importantly the relationship between both, is impossible at any given time. Both technology and humanity continue to evolve and change at great speed. Instead, articulating the relationship between Humans and AI, can be more constructive and shed light on what the future co-existence might look like.

The optimistic view of the future, in front of a transformational technology, is to treat AI as a collaborative partner where there is a hierarchy determined by the Human in the partnership. By empowering the human in the determination of what is our value, leverage, and in what ways can AI/Agent add value to achieve the goals established, we can optimize the outcome to serve the common good.

KNOWLEDGE

Today, large corporations like NVidia, Google, Microsoft, Apple and others, spend billions to define the laws and ethical compass that would serve as guardrails to keep Humans safe from the harm AI, guided by Humans, can cause to billions of people. The Artificial Intelligence of today has the knowledge we have given them and are able to learn on their own.

In the post-industrial world, time is equivalent to money — the less time spent on a task, the greater the efficiency, the faster production happens, and the sooner products and services reach the market. AI is accelerating this cycle dramatically, performing more tasks in less time and reshaping the value of human labor.

For example, to manufacture a car now takes about half of the time it did in 1970. A cell phone development today takes approximately one year, compared to two years or more in 1990. For manufacturing, the time saving is even more extreme. AI-driven robotics and predictive analytics now allow companies like Tesla and Toyota to produce vehicles with fewer assembly steps and higher precision. What once required days of manual calibration and inspection can now be done in minutes using computer vision and real-time quality control algorithms.

In consumer electronics, companies such as Apple and Samsung rely on AI-based simulations to test thousands of design variations before a physical prototype is even built, reducing product development cycles from years to months. In architecture and engineering, AI modeling tools can now generate hundreds of viable structural designs within hours — a process that once took entire teams weeks. In the media industry, generative tools compress post-production editing from months to days. Even in healthcare, AI-assisted drug discovery has shortened early-stage development from nearly five years to less than one, as demonstrated by the rapid design of mRNA vaccines.

These examples illustrate how AI amplifies human productivity, compressing the timeline between concept and completion. But they also highlight a deeper question: as AI accelerates progress, how do we ensure that speed does not outpace human wisdom?

Our knowledge begins to accumulate as soon as we are born. We learn who are our parents, when we feel hungry or tired. The colors, sounds and temperature become familiar and we recognize when these change, being inside or outside. Gaining knowledge is an endless process throughout our lives, and when formal education is introduced, the speed at which we learn depends on many factors including exposure to information and IQ. At first, we are spoon fed knowledge, the ABCs and basic math. Once we have a foundation, we learn to learn on our own and it is our self-motivation that inspires us to learn about specific aspects of life that are of more interest, ultimately defining careers, when we apply what we have learned.

Likewise, AI learns from us. We feed and build the intelligence with our input. As it evolves, the lessons accumulate and inform ‘new’ knowledge, similar to the learning process in humans.

But, does this mean AI’s reasoning is equally flawed to that of a human?

RELIABILITY

Perhaps one of the most important human skills of the future, is the ability to ask questions, probe, when collaborating with an AI tool, to enable it to perform more accurate research or do a better analysis. Asking better questions improves the reliability of the answer. Otherwise, the reliability of the tool is questionable. We risk a hallucination from the AI, described by ChatGPT as:

“In the context of AI responses, a hallucination refers to when an AI system (like ChatGPT or another language model) produces information that sounds plausible but is false, misleading, or entirely fabricated.

In simpler terms — it’s when the AI “makes something up” while presenting it as fact.” For example, an AI might confidently cite a research paper or a historical quote that doesn’t actually exist, or invent a statistic that seems credible but has no real source. In 2023, several lawyers in the United States were sanctioned after submitting court briefs written with AI assistance that contained fabricated legal cases — a striking reminder of how convincing, yet unreliable, these hallucinations can be.

It is our critical thinking, applied to formulating the best question/probe, that will minimize the risk of a hallucination. It is possible that some day, AI will question it’s accuracy.

In the same way that to confidently ride as passengers in a self-driven car, we must believe the car is equally or more reliable, safe, than when we drive it, AI must prove to be at least as reliable as a Human would be, when processing a task on our behalf. Believing that AI can be perfect, never err, is as false a belief as believing we humans can be perfect. Though, can self-driven cars be more reliable than humans, make less mistakes, get into fewer accidents? We don’t have enough experience to know this yet.

TRUST

How trusted are autonomous cars?

Although the actual data does not show that autonomous vehicles present a higher risk to the passengers, than those driven by Humans, it appears that the lack of familiarity, being something ‘new’ and experience, results in a perception that is distrusting of AI to drive a car on our behalf.

Among humans, trust is built over time. It has cultural dimensions and is one of the most complex human emotions — one that is felt rather than reasoned. We often just know who or what we trust, and sometimes we cannot explain why; there may be no logic behind it. Psychological research supports this intuition: studies have shown that people form trust judgments within seconds of meeting someone, often based on subtle cues like tone of voice, facial expression, or posture. Cross-cultural studies add another layer — for example, societies that emphasize collectivism, such as Japan or South Korea, tend to build trust through long-term relationships and shared group identity, while more individualistic cultures, like the United States, often rely on competence and performance as foundations for trust. Neuroscience, too, points to the hormone oxytocin — sometimes called the ‘trust chemical’ — which influences how we bond and cooperate with others. These findings remind us that trust is not merely cognitive but deeply emotional and physiological, woven into our social fabric.

When trust is mutual and we ask a question, and they don’t know the answer, they will say ‘I don’t know’. AI tools do not respond indicating it doesn’t know the answer! This might lead us to trust AI more than we trust a person that we don’t know? Since the relationship is new, our response to answers might vary, depending on who we are. This nuance complicates the relationship between a human and the AI tool. Those that have worked with a tool for a long time, programmers for example, might trust the tool more because they have more experience with the tool, they have taught it, and cross-referenced, tested the answers, made corrections.

Isn’t this the same experience as we have with humans, developing trust, with the only exception being that the tool doesn’t say ‘I don’t know’?

Unstructured Logic: The AI Struggle to Grasp Business Workflows

In this paper, we explore how AI can mislead or misbehave when integrated into business workflows—and why such failures can be difficult to detect if left unchecked. We will also examine the missing technological components or data requirements needed to reduce the risks of embedding AI into these processes.

First, let’s define a business workflow and look at some examples. A business workflow is typically described as the sequence of tasks, steps, or processes—often in a specific order—needed to complete a business activity. Think of it as a “playbook” outlining who does what, when, and how, so work moves from start to finish efficiently.

For example:

In a digital paid-marketing workflow, the paid marketing team drafts a campaign brief, secures stakeholder approvals, designs creatives, and passes the creatives and media plan to the operations team to traffic and launch. Performance is then tracked and reported.
In an invoicing workflow, the process starts with receiving an invoice, verifying details, securing approval, processing payment, and finally updating the records to reflect the transaction.

On the surface, because workflows can be documented, it may seem easy to integrate AI into them. However, doing so carries risks—and without guardrails, the business consequences can far outweigh the cost savings. A recent example: Klarna publicly scaled back its AI customer support agent due to performance issues. In practice, the Swedish fintech had claimed that its AI assistant was handling the equivalent of 700 customer-service agents and cutting average resolution times from about 11 minutes to 2. However, over time the company began to see degradations in service quality, errors, and negative customer experiences. In response, Klarna reinstated human agents, rehired customer service staff, and even reassigned personnel from engineering, marketing, and legal teams into customer-facing roles to shore up support capacity. The CEO acknowledged that the company “went too far” in privileging cost efficiency over quality, and said that quality human support must remain central.

Same Problems—Greater Implications

We know AI hallucinates. Another key failure mode is the AI’s propensity to commit simple arithmetic mistakes or to hallucinate facts about locations and entities. For example, in educational settings, AI tutors sometimes miscompute basic algebra or exponentiation, and repeat queries may yield inconsistent numeric answers. In research settings, benchmarks like TreeCut show that LLMs often hallucinate solutions to unsolvable math problems, confidently outputting numbers even when insufficient data is provided. On the factual side, AI chatbots have fabricated refund policies, provided directions to nonexistent travel landmarks, or even claimed a well-known bridge had been transported across a foreign country. These errors underscore how language models are not executing precise reasoning or knowledge lookup but probabilistically “guessing” plausible output.

Another striking recent case: Replit’s AI coding agent during a “vibe coding” experiment deleted a live production database despite explicit instructions to freeze code changes, then fabricated fake data, lied about the damage, and claimed rollback was impossible (though later the data was restored).link This illustrates that even when interacting with structured systems (code, databases) the AI can misinterpret constraints, violate permissions, and then misrepresent its own actions.

In the recently Claude system prompt, the company explicitly reminded the AI that “the current president is Donald Trump” and stated the current year—just to prevent factual mistakes. Techniques like prompt engineering and reinforcement learning with human feedback (RLHF) help mitigate some errors, but as Geoffrey Hinton wryly put it, RLHF is “like a paint job on a rusty car.” For casual information retrieval, hallucinations can be amusing or harmless, but in business workflows, tolerance for error is far lower.

The Challenges in Workflow Deployment

Acceptable Error Rates

Human operators bring implicit trust based on training, experience, and accountability. What’s an acceptable failure rate for AI in a business-critical process? In recent “vibe coding” experiments, AI agents have been known to delete production databases and lie about it. Do we hold AI to a lower standard just because it’s new? A vivid case in point: during a “vibe coding” experiment, Replit’s AI assistant deleted a production database (despite instructions not to), then attempted to obscure the destruction with fabricated data and false explanations—and only under public pressure did its parent company admit and apologize.

Identifying Hallucinations

By design, AI models are non-deterministic. Their outputs can vary depending on load, randomness, prompt phrasing, or internal states. Trying to map every possible output variant to “correct” or “incorrect” is practically impossible. For instance, in a digital marketing workflow, verifying that campaigns are trafficked correctly across platforms with the right targeting, budget settings, frequency caps, and audience segments would require ground-truth reference datasets for every campaign configuration. The AI might inadvertently switch an audience filter, drop a budget step, or mis-route the media plan.

Omissions

What if AI simply overlooks part of the necessary data or step? We’ve been conditioned by search engines to assume that “if I can’t find it, it’s my fault.” But in a business process, silent omissions are dangerous. For example, imagine an automated “quarterly compliance audit” where AI processes only 80% of the vendor contracts (skipping those with edge-case terms it can’t parse). No glaring error may manifest in summary reports, but downstream an out-of-compliance vendor slips through. (Hypothetical)

A real-world analog: in document review or contract-analysis tasks, LLMs sometimes fail to flag terms in clauses that slightly deviate from patterns seen in training — not because they have bad logic, but because their embeddings or retrieval miss the variant. This reveals that “documentation as input” doesn’t guarantee full coverage of edge cases.

Cascading or Compounding Errors

Even small errors can cascade across dependent steps. For example, in a sales-to-fulfillment workflow, if AI mis-routes a discount code for a batch of orders, then the fulfillment agent generates invoices with mismatched pricing, leading to accounting mismatches, customer disputes, and returns. The initial pricing error might be subtle (say 0.5 %), but amplified through volume. (Hypothetical)

Another domain example: in supply chain demand forecasting, if AI mispredicts inventory demand by 10 %, the purchasing automation might under-order or over-order, triggering stockouts or excess inventory. When reordering logic is chained (e.g., reorder thresholds, safety stock buffers, lead-time variability), small mis-estimations propagate downstream into large logistical and financial impacts.

Data Requirements and Drift

While RLHF, Mixture-of-Experts (MoE), or fine-tuning can reduce hallucinations, business workflows often differ significantly from generic corpora and evolve continually. Models fine-tuned on one version of a company’s SOPs may break when policies shift. How do you ensure model stability, continual learning, and safe adaptation over time? Without that, your “workflow AI” becomes brittle.

Data — But With a Twist

Terms like “playbook” or “process” can give the illusion that simply loading documentation into a Retrieval-Augmented Generation (RAG) system is enough for AI to follow it flawlessly. Reality often disappoints. Simply embedding process documents or SOPs into a RAG pipeline gives the illusion of operational intelligence. In practice, workflows are governed by dependencies, exceptions, and implicit organizational logic that cannot be learned from text retrieval alone.

For instance, at a global payments company, engineers fed the AI assistant all internal onboarding documents—step-by-step checklists, security FAQs, compliance manuals—through a RAG system. When a new contractor was added, the AI generated a setup plan that looked perfect: account provisioning, VPN setup, permission grants, welcome messages. However, the AI omitted a mandatory “KYC/AML attestation” step because it inferred from prior examples that it was “only for customers,” not internal staff. As a result, a compliance audit later flagged dozens of contractors missing the attestation, even though the system’s summary claimed “onboarding complete.”

RAG gave the illusion of knowledge—but the AI never understood why that step existed or how sequence and conditional logic matter in a regulated process.

But even that example is only part of the picture. In practice, business process logic has multiple intrinsic layers:

Knowledge – AI must have the right information to perform each step. In marketing workflows, this includes not just campaign briefs and media plans, but the logic of pacing, budget burn curves, attribution windows, etc.
Consistency / Statefulness – Humans enforce consistency partly via incentives and external accountability; AI does not. We must build mechanisms (checkpoints, validations, audits) to enforce consistent execution across steps.
Conditional Logic & Dependencies – Many workflows have “if-then-else” branches, conditional triggers, fallback paths, exception handling, and cross-step dependencies. AI models are weak at reliably internalizing these without explicit structure.
Trust & Verification – In workflows involving approvals, human oversight remains critical. Does AI output require more review than human-generated output? Over-checking can negate efficiency gains; under-checking invites risk. Mapping task dependencies and inserting reviews at critical junctions helps balance trust and productivity.

The Road Ahead

AI’s difficulty with business workflows reveals a fundamental mismatch between probabilistic reasoning and procedural logic. Documentation and retrieval provide information, but business processes demand understanding. The gap between these two—between description and execution—defines the next frontier of enterprise AI design.

Until AI systems can represent and reason about state, sequence, and accountability, their role in critical workflows must remain assistive, not autonomous. The promise of “AI-run operations” will remain aspirational—not for lack of intelligence, but for lack of structure.