top of page
Image by DON JACKSON-WYATT

BATTLE REPORTS

The Self-Improving Harness: Why Tax AI Is the New Front Line in the War of the Ecosystems

TL;DR

The real breakthrough in OpenAI and Thrive Holdings' Tax AI is not that the base model improved itself.

The self-improving system is the harness around the model: practitioner corrections become production traces, traces become evaluations, and evaluations drive Codex-assisted workflow improvements.

For executives, this marks a shift from buying AI tools to building AI-enabled operating loops.

The firms that win will instrument high-value workflows, capture expert feedback, govern trust, and turn every correction into compounding ecosystem advantage.




Executive Summary

OpenAI, Thrive Holdings, and Crete's network of more than 30 accounting firms have demonstrated something more strategically important than another artificial intelligence productivity tool. They have shown a repeatable pattern for professional-services transformation: practitioner corrections become structured evidence, evidence becomes evaluations, and evaluations become Codex-driven product improvements.


Tax AI processed 7,000 returns across participating Crete firms, focused on complex 1040 and 1041 preparation. OpenAI reported that the system saved practitioners about one-third of tax-preparation time, drafted returns with up to 97% accuracy, and increased throughput by about 50%. At launch, only 25% of returns reached 75% correct field completion; within six weeks, 86% reached that mark. OpenAI's public post also says the 90% and 100% completion thresholds improved even faster, while a secondary report stated that 90% of returns eventually reached 100% correct field completion. [1][2] (OpenAI)


The headline is not that the base model became smarter. The headline is that the operating harness improved. As the briefing insight frames it: "the harness self-improves, not the base model." That distinction matters for every advisory, audit, finance, legal, insurance, and operations leader trying to move artificial intelligence from experiments to measurable operating advantage.

In the language of War of the Ecosystems, this is not merely a product win. It is a new ecosystem maneuver. The winning force is not one model, one software vendor, or one accounting firm. It is the coordinated battlefield system: platform capability, practitioner expertise, production traces, evaluation infrastructure, workflow governance, and a services network capable of turning human corrections into compounding advantage.



1. Call to Arms: The AI Battlefield Has Moved From Models to Operating Loops

For the last two years, many executives have fought the wrong battle. They have compared models, negotiated licenses, launched copilots, and measured adoption. Those activities matter, but they are not sufficient. The decisive battlefield is now the operating loop around the model.


McKinsey's 2025 State of AI survey captures the gap. Nearly nine in ten respondents said their organizations regularly use AI in at least one business function, but most organizations still had not embedded AI deeply enough into workflows to generate material enterprise-level benefits. McKinsey also found that high performers are more likely to redesign workflows, define human-validation processes, and scale AI with senior-leader ownership. [3] (McKinsey & Company)


Tax AI is important because it shows what scaled workflow redesign looks like in a high-friction professional domain. Tax preparation is not a toy workflow. It has messy documents, client-specific notes, prior-year information, regulatory consequences, practitioner judgment, and final human accountability. OpenAI describes the system as preserving production traces from source files through extracted fields, tax-engine mapping, practitioner correction, and final submission. Those traces become the battlefield intelligence. [1] (OpenAI)


This is where Ecosystem Commanders must change doctrine.

The old question was: "Which model should we buy?"

The new question is: "Which workflow can we instrument so that every expert correction improves the system, reduces future friction, and strengthens the ecosystem?"

That is the shift from using AI to building an AI-enabled ecosystem.



2. Vertical Conquest: Tax Is the Beachhead, Not the Whole War

Tax preparation is the first visible beachhead. The larger conquest is professional work that is document-heavy, rules-based, judgment-sensitive, and constrained by trust.


In Alejandro Canonero's ecosystem framing, five battlegrounds matter: customers, developers, independent software vendors, service partners, and hardware or infrastructure allies.

Tax AI activates all five.


First, the customer battlefield is the accounting firm and its end client. The system does not simply promise lower cost. It frees practitioners to spend more time on client-facing advisory. OpenAI reported that one senior accountant who spent 180 hours on tax preparation last year spent only 15 hours this year, using the released time to call clients, explain returns, and take on new work. [1] (OpenAI)


Second, the developer battlefield is the Codex-driven improvement environment. OpenAI's harness-engineering work describes a shift in software development where "humans steer" and agents execute within structured repositories, tests, documentation, and review loops. That principle becomes more powerful when connected to real production failures from expert workflows. [4] (OpenAI)


Third, the independent software vendor battlefield includes tax engines, document-management systems, workflow platforms, audit tools, and analytics providers. The winners will be the vendors whose systems can expose traces, preserve provenance, support evaluations, and integrate into agentic improvement loops.


Fourth, the service-partner battlefield is the most underestimated. Accounting firms, audit practices, advisory firms, managed-service providers, and operations consultancies are no longer merely implementers. They are the source of expert corrections that train the harness. Their daily work becomes the feedback supply line.


Fifth, the infrastructure battlefield includes secure cloud, compute, storage, observability, and data-governance layers. Without reliable infrastructure and traceability, the loop collapses.


The vertical lesson is clear: the first-mover advantage does not come from generic AI access. It comes from owning the production context where expert corrections occur.



3. Strategic Weapon: The Harness Is the Force Multiplier

The strategic weapon is not "AI" in the abstract. It is a self-improving harness: a bounded system that captures expert feedback, turns it into structured evidence, converts recurring failures into evaluations, and uses Codex to investigate and propose fixes.


OpenAI describes three pillars behind Tax AI: practitioner feedback, production traces, and a Codex-driven loop based on tailored evaluations. In a rental-property example, practitioner corrections reveal failures, traces show where the failure occurred, and Codex inspects the trace, repository, evaluations, and product scaffold to propose changes. Ambiguous cases route back to humans rather than being forced through automation. [1] (OpenAI)


That last sentence is essential. This is not autonomous tax judgment. It is bounded workflow improvement. Engineers remain responsible for architecture, product decisions, and shipping; practitioners steer the loop through review and approval. [1] (OpenAI)


This is the difference between a reckless charge and disciplined combined-arms warfare. In the Battle of the Atlantic, the Allies did not win by building one magic ship. They won by integrating radar, codebreaking, convoy doctrine, aircraft coverage, escort tactics, and constant feedback from the front. Tax AI shows the same logic in enterprise AI: the model is only one asset. The decisive advantage comes from coordination across signals, workflows, tools, experts, and governance.


OpenAI's Symphony work reinforces the replicability of this pattern. Symphony is an open-source Codex orchestration spec and reference implementation designed to turn project-management workflows into a control plane for coding agents. OpenAI says it open sourced Symphony to demonstrate Codex App Server paired with workflow tools, and to let others build versions tailored to their environments. [5] (OpenAI)


For advisory firms, this means the next product frontier is not a chatbot that answers tax questions. It is a workflow harness that learns from every review, exception, correction, and client-specific decision.



4. Decisive Battlefield: Trust, Governance, and Metrics

The Tax AI case also exposes the constraints that will decide winners and losers.

The first constraint is trust. In finance and accounting, accuracy is not a feature; it is permission to operate. AICPA & CIMA report that finance leaders see AI as a defining shift, yet only 8% feel very well prepared to adopt it, and 46% identify generative AI as the most significant skills gap for 2025. [6] (AICPA & CIMA)


The second constraint is accountability. The system must preserve provenance: what document supported the extracted field, what the agent proposed, what the practitioner changed, and what ultimately went into the filed return. This is why production traces are strategically superior to loose prompt logs.


The third constraint is return on investment. Gartner has warned that more than 40% of agentic AI projects may be canceled by the end of 2027 because of escalating costs, unclear business value, or inadequate risk controls. Gartner's practical recommendation is to pursue agentic AI where it delivers clear value through cost, quality, speed, and scale. [7] (Gartner)


Tax AI meets that standard because the value is measurable: preparation time, throughput, field completion, accuracy, review burden, client capacity, and expansion into adjacent services.


In Alejandro's Strategic Metrics language, Ecosystem Commanders should measure this kind of initiative through seven lenses:


  1. Productivity ROI: reduction in preparation hours per return.

  2. Throughput: number of returns processed per practitioner or team.

  3. Quality: share of returns at 75%, 90%, and 100% correct field completion before correction.

  4. Partner-driven revenue: incremental revenue from firms adopting the harness.

  5. Upsell and cross-sell: advisory, audit, bookkeeping, and operational services enabled by freed capacity.

  6. Consumption growth: increased usage of the AI platform, tax engine, storage, and workflow systems.

  7. Trust velocity: reduction in review cycles required before practitioner approval.


The decisive point is that the harness converts trust from a slogan into an operating metric.



5. Execution Playbook: How Ecosystem Commanders Win the Next 12 to 24 Months

The General's Playbook applies directly.


1. Build a shared vision around workflow conquest

Do not launch AI as a generic productivity program. Select one high-friction workflow where expert corrections are frequent, expensive, and measurable. Tax preparation was the beachhead for OpenAI, Thrive, and Crete. Other candidates include audit evidence review, lease abstraction, insurance claims, procurement exceptions, help-desk resolution, financial close, and compliance monitoring.


2. Create trust-based relationships with practitioners

The practitioner is not an obstacle to automation. The practitioner is the intelligence source. Tax AI improved because accountants corrected real work, clarified ambiguity, and revealed which failures mattered. The ecosystem leader must make experts feel that AI is amplifying their judgment, not extracting it without credit.


3. Make the system easy to use and easy to correct

A self-improving harness depends on correction capture. If review is painful, the supply line breaks. Every practitioner correction should be structured, attributable, and useful for future evaluations.


4. Align incentives

Accounting firms should not be rewarded only for hours billed. They should be rewarded for cycle-time reduction, client expansion, quality, and advisory conversion. Platforms should reward partners who contribute reusable evaluations, validated workflows, and industry-specific improvement patterns.


5. Scale through repeatable patterns

OpenAI says the same three-part design from Tax AI is being used as a blueprint for other accounting workflows such as bookkeeping and audit, and for operational workflows such as information-technology help-desk automation. [1] (OpenAI)

The Commander's move is to package the pattern: trace schema, review workflow, evaluation library, governance model, partner enablement, and marketplace-ready solution.


6. Innovate continuously

The advantage compounds only if the loop keeps running. Weekly correction reviews, recurring evaluation generation, regression testing, and controlled deployment should become the new operating rhythm.


7. Govern with actionable insights

Dashboards must move beyond adoption. Track where corrections occur, which fields fail repeatedly, which document types create ambiguity, which partners generate high-quality feedback, and which fixes reduce review load.


Key Takeaways for Ecosystem Commanders


  1. The strategic breakthrough is not that the base model self-improves. The harness improves through expert corrections, production traces, evaluations, and Codex-driven iteration.

  2. The next ecosystem war will be fought inside workflows, not across model leaderboards.

  3. Professional-services firms should treat practitioner corrections as strategic data assets.

  4. Platforms that expose traces, support evaluations, and enable governed agentic loops will attract the strongest service partners.

  5. Advisory leaders should start with one measurable workflow, instrument it deeply, and build a repeatable harness before expanding.

  6. The winning metrics are not demos or adoption rates. They are throughput, accuracy, review reduction, partner revenue, client expansion, and trust velocity.

  7. The Commander's mandate is clear: build the battlefield system, not just the AI tool.



References

[1] OpenAI, 2026, "Building self-improving tax agents with Codex." (OpenAI)

[2] The Deep View, 2026, "Self-improving agents are AI's next act." (The Deep View)

[3] McKinsey & Company, 2025, "The State of AI: Global Survey 2025." (McKinsey & Company)

[4] OpenAI, 2026, "Harness engineering: leveraging Codex in an agent-first world." (OpenAI)

[5] OpenAI, 2026, "An open-source spec for Codex orchestration: Symphony." (OpenAI)

[6] AICPA & CIMA, 2026, "Future-ready finance: Technology, Productivity, and Skills Survey Report." (AICPA & CIMA)

[7] Gartner, 2025, "Gartner Predicts Over 40% of Agentic AI Projects Will Be Canceled by End of 2027." (Gartner)

[8] James F. Moore, 1993, "Predators and Prey: A New Ecology of Competition," Harvard Business Review. (Harvard Business Review)

[9] Marco Iansiti and Roy Levien, 2004, "Strategy as Ecology," Harvard Business Review. (Harvard Business Review)

[10] Ron Adner, 2017, "Ecosystem as Structure: An Actionable Construct for Strategy," Journal of Management. (journals.sagepub.com)

[11] Michael G. Jacobides, Carmelo Cennamo, and Annabelle Gawer, 2018, "Towards a Theory of Ecosystems," Strategic Management Journal. (sms.onlinelibrary.wiley.com)

 
 
 

Comments


bottom of page