B02: Centaurs at Scale

In 1998, after his defeat by IBM's Deep Blue, Garry Kasparov ran an experiment called Advanced Chess. Human players could consult a computer at any point in the game.

The result overturned the obvious prediction. Neither the strongest grandmasters nor the most powerful computers won. The winners were human\u2013computer teams that had learned how to work together: when to trust the machine's calculation and when to override it with human judgement.

Most organisations have now seen AI improve individual productivity. Very few have redesigned their businesses to work at scale. Twenty-five years later, that is the central challenge of enterprise AI adoption. The gap between organisations deploying AI and those extracting sustained value from it is widening fast.

A 2024 field study by Boston Consulting Group and Harvard Business School, involving 758 consultants, found that those using AI produced work 40% higher in quality and completed tasks 25% faster. Yet a 2025 BCG study found that only 4% of companies have achieved significant, scaled value from AI.

The difference is not model access or budget size. It is organisational design. The companies pulling ahead have answered an important, practical question: how should humans and AI work together across the business\u2014who decides what, who checks what, and who is accountable when the AI is wrong?

From individual productivity to enterprise design

The evidence on AI and individual productivity is substantial. GitHub reports coding speed improvements of over 50% with Copilot. Morgan Stanley's AI Debrief tool achieved near-universal adoption among its financial advisors, saving time per client meeting and contributing to measurable asset growth. Across knowledge work, first-draft productivity gains of 30\u201350% are now common where AI is deployed well.

These numbers are real. But taken alone, they are misleading.

The same BCG/HBS study that found 40% higher quality work also revealed a risk: consultants who relied on AI without actively checking its outputs produced errors that were more confident, more polished and harder to detect than errors made without AI. A well-structured wrong answer is more dangerous than an obvious rough draft.

The researchers called this the \u201cjagged frontier\u201d. AI performs exceptionally well on some tasks and fails unpredictably on others. The boundary is not visible in advance. Teams that actively questioned outputs avoided failure. Those that accepted them did not.

Morgan Stanley's results were not produced by the tool alone. Advisors were trained explicitly on when to rely on AI and when to override it. Client outcomes were measured, not just usage. Discipline and oversight, not technology alone, produced the value.

The lesson is this: getting access to AI improves productivity; designing how humans supervise and challenge it determines performance. Execution scales through machines. Advantage scales through judgement. Centaurs at scale are designed, not accidental.

How companies get more from AI

Two enterprises can use the same model and produce radically different results. The difference is rarely technology. It is operating discipline. Five conditions consistently separate high-performing human\u2013AI teams from those that merely deploy tools:

Challenge discipline. High-performing teams treat AI outputs as a starting point, not a conclusion. Assumptions are tested. Recommendations are interrogated. Code is reviewed. Goldman Sachs, for example, embedded mandatory human review into its AI-assisted engineering workflow across tens of thousands of developers. AI may draft. Humans remain accountable.

Research on automation bias shows a predictable pattern: the more reliable AI appears, the less people challenge it. That is precisely when scrutiny matters most. Challenge discipline must be designed into workflows, not left to individual vigilance.

Clear judgement and override rights. Organisations that perform well are explicit about which decisions require human sign-off and which AI can execute autonomously. Without clarity, AI outputs drift into acceptance by default. No one is clearly empowered \u2014 or expected \u2014 to reject them. High-performing enterprises assign override authority deliberately.

Explicit escalation pathways. When an AI output appears wrong or uncertain, there must be a named escalation route and accountable owner. Not an IT helpdesk. A human responsible for that domain's AI behaviour. Under the EU AI Act, this is now mandatory for high-risk systems. Even where not regulated, the discipline is operationally essential.

Lifecycle management. AI systems do not remain stable. Performance drifts. Context changes. Leading organisations treat models as managed assets, not one-off deployments. Output quality is monitored. Evaluations are repeated. Updates are controlled. Replacement is planned. In regulated sectors, this is compliance. In all sectors, it is performance protection.

Embedded evaluation loops. The strongest organisations measure where humans correct AI most frequently and use that signal to improve both the model and the way people work with it. AI improves fastest where feedback is systematic. Deployment without feedback is automation. Deployment with feedback is learning.

None of these five conditions appears automatically when AI tools are deployed. All require deliberate choices about who is responsible for what. The organisations that have made those choices are, consistently, the ones generating real value from AI.

Role shifts in the Centaur Enterprise

The conditions above do not sustain themselves. They require people whose job it is to build and maintain them. AI changes the jobs that organisations need.

Some roles are genuinely new, not just new titles on old jobs, but different in what they require people to do. Some are recognisable but substantially changed in skill and scope. Some remain largely the same in purpose but change in how that purpose is carried out. And some roles that were primarily defined by processing volume will shrink as AI takes over that volume.

Getting this right means planning ahead rather than reacting. Organisations that wait for AI-driven headcount pressure to force changes are already behind.

Roles that are genuinely new

AI Product Manager. Traditional product management involves fixed requirements, deterministic software, and predictable release cycles. AI is different in every one of those dimensions. AI outputs vary. Performance degrades over time. Failure modes are often not visible until the model is in live use. The AI Product Manager owns the evaluation framework: how model quality is measured, how that measurement evolves as business needs change, and how updates are introduced without disrupting what already works.

By 2025, an estimated 33% of large organisations had created a Chief AI Officer or equivalent executive role, according to Gartner. The AI PM role sits below that: making daily decisions about which model is used, how it is evaluated, and when it needs to change.

AI Governance / Responsible AI Lead. Regulatory pressure on AI is now a first-order business concern. The EU AI Act classifies AI systems by risk level and requires documented human oversight, bias testing, and conformity assessments for high-risk applications in employment, credit, education, and public services. The Responsible AI Lead manages the organisation's compliance exposure on AI behaviour, not as a legal formality, but as a live management responsibility.

Trust & Safety (Enterprise AI). As AI agents move from generating documents to taking actions, sending communications, processing transactions, handling customer interactions in real time, a new oversight function becomes necessary. Klarna's experience is instructive: having replaced a substantial part of its customer service function with AI agents in 2024, the company subsequently began rehiring human agents, acknowledging that AI-only customer interactions had created service quality gaps that were hard to detect until customer trust had already been damaged.

Roles that are substantially changed

Head of GenAI / Transformation Authority. The highest-performing organisations have created a cross-enterprise role whose mandate is not to run AI projects but to set the rules under which AI is deployed across the business: which tools, which standards, which oversight requirements. Without it, the pattern is predictable, with dozens of disconnected AI pilots, no shared infrastructure, no consistent evaluation, and productivity gains that stay isolated in individual teams rather than adding up to anything at enterprise scale.

CIO. As AI systems are embedded in finance, supply chain, customer service, and HR, the CIO becomes responsible for the standards and security controls that determine whether those systems can be trusted. This goes beyond managing infrastructure. It requires understanding how AI models perform, what data they need and how that data is governed, and what risks are created when AI agents have access to enterprise systems and customer information.

Developers. GitHub Copilot data shows a 55% increase in coding speed; Goldman Sachs has reported a 20% productivity gain across its 46,000-strong engineering workforce. But speed is not the most important change. AI-generated code is produced faster, at higher volume, and with more surface area for subtle errors than human-written code. Developers in organisations using AI well spend less time writing and more time reviewing, checking for correctness, security gaps, and long-term maintainability.

Marketing and Creative Leads. AI has removed volume as the constraint in marketing. A year's worth of content can now be drafted in days. The constraint is now judgement: what is actually on-brand, what serves the audience, what should not be published. Marketing and creative leads are shifting from managing production to curating output.

HR and Talent Leaders. AI-enabled workforce analytics give HR faster and more detailed insight into the workforce than was previously possible. At the same time, the workforce itself is changing in ways that HR needs to manage actively: some roles compressing, others requiring new skills, and new roles that did not previously exist. Workforce transition planning, once an occasional project during restructuring, is becoming a permanent feature of the HR agenda.

Roles that largely remain stable

Not all senior roles change materially. P&L owners retain full accountability for business performance. Strategy leads retain authority over direction. Risk and Legal functions retain their purpose of setting limits and managing compliance. These roles give the enterprise its direction, its risk limits, and its lines of accountability. AI can inform all three. It cannot replace them.

Roles likely to compress

Some role categories will reduce in scale: manual reporting, first-draft content, routine data preparation, and tier-1 query handling. The WEF's Future of Jobs Report 2025 projects that AI will create 170 million new roles while displacing 92 million. The net is positive in aggregate but uneven across functions, levels, and sectors.

Sequencing the transition

Building the centaur enterprise is not a single transformation programme. It is a phased build, and the sequencing matters. The two common failure modes are deploying AI without adequate oversight and designing oversight without deploying anything.

Phase 1: Establish design authority and guardrails. Before scaling AI deployment, the organisation needs someone with a clear mandate to set the standards: which models can be used, how they must be evaluated, and what human oversight is required for different types of decision. These rules are much harder to impose after widespread deployment than before it.

Phase 2: Deploy with discipline in priority areas. The first deployments should be run as learning exercises, not just productivity drives. That means measuring the quality of human-AI collaboration, not just output volume, and using what is learned to improve both the model configuration and the way people work with it.

Phase 3: Formalise governance and monitoring. As AI moves from pilots to standard practice, governance needs to keep pace: model lifecycles actively managed, audit trails maintained, escalation pathways tested, and regulatory exposure monitored.

Phase 4: Redesign roles and build supervision capability. The most commonly deferred step, and the one that determines whether the centaur enterprise works in practice or only looks good on paper, is the deliberate redesign of how roles are defined and how performance is measured.

The leadership question

Most leadership discussions about AI focus on productivity: how much faster, cheaper, or higher-volume can we operate? That is the right question for the business case, but it is not the only one.

When AI is involved in a decision and something goes wrong, who is responsible? AI does not eliminate accountability; it makes it harder to locate. The leaders who handle this well have answered that question before something goes wrong: which decisions require human sign-off, who can reject AI recommendations, and what happens when an AI-driven outcome causes harm. Those are not questions for the technology team. They belong in the boardroom.

Research on automation bias shows this consistently: as AI systems become more reliable, people challenge them less. The AI appears to be working; pushing back feels unnecessary. Over time, organisations that do not actively counter this tendency find that people gradually stop checking, and errors go undetected because no one is looking for them.

Preventing this requires more than encouraging people to push back. It needs to be built into how the organisation works: who is responsible for challenging AI outputs in each area, how concerns get raised and resolved, and how performance is measured. A one-off training course will not hold.

The organisations that will lead on AI over the next decade are not necessarily those that moved first. They are those whose leaders took the human side of the partnership as seriously as the technology.

Conclusion

Kasparov's insight from Advanced Chess was precise: how well the human and machine worked together determined the outcome, not the strength of either alone. The best teams were not defined by their hardware or their individual skill. They were defined by how well they had designed the partnership.

That is now the central challenge for every leadership team with an AI strategy. The tools are available, powerful and broadly similar. The differentiator is how the organisation is built to use them.

← Back to Perspectives

Human - AI Organisation Design

Centaurs at Scale