The Keynesian Folly: Why AI Will Never Fully Automate Finance

Every profitable pattern an AI finds in financial markets contains the seeds of its own destruction.
That sentence sounds dramatic. It's also mathematically provable. Financial markets are reflexive systems. When an algorithm identifies a profitable signal, capital flows toward it. Other algorithms detect the same signal. The pattern compresses, then disappears. Ola Mahmoud at the CFA Institute puts it precisely: success alters the system being measured. Goldman Sachs projects $1 trillion in AI capital expenditure. 80% of large financial institutions now use AI in core decision-making. And the CFA Institute's position is unambiguous: the future is AI plus human intelligence, not AI replacing it.
I spent 13 years watching this dynamic play out from inside financial institutions. Credit underwriting, fintech lending, data science, fraud detection. The tools changed every few years. The fundamental problem never did: the moment you systematize an edge, the edge starts dying.
The Reflexivity Problem
Keynes predicted in 1930 that technology would reduce the workweek to 15 hours. Robert Solow observed in 1987 that computers appeared everywhere except in the productivity statistics. AI is triggering the same paradox in finance. The technology is everywhere. The promised returns keep receding.
The reason is structural. Chris Hennessy and Charles Goodhart formalized this in a 2023 paper in the International Economic Review. They proved that penalized regressions (Ridge and Lasso, the workhorses of quantitative finance) generate measurable Goodhart bias when future market participants can manipulate the covariates. The bias scales proportionally with penalization strength. This is Goodhart's Law applied to machine learning, and the math doesn't care how sophisticated your model is.
The evidence for pattern decay shows up in real markets. In Indian options markets, the 920 Straddle strategy demonstrated textbook crowding effects after widespread adoption: total profit collapsed to 673 points with 0.9 points per trade, accuracy fell to 36%, and maximum drawdown ballooned to 520 points. First movers captured most of the profit. Everyone else fed the decay.
A Polymarket trader captured the core failure perfectly: "AI tested 10,000 combinations. Found five with 300%+ returns. Then I asked: who's losing money on the other side of this trade and why will they keep doing it?" The AI couldn't answer. That question requires understanding the counterparty's constraints, incentives, and institutional mandate. No training dataset contains that.
The reflexivity extends to the data itself. An analyst who physically traveled to the Strait of Hormuz discovered that AIS ship tracking data has roughly a 50% blind spot. Ships routinely go dark or spoof destinations. Every oil model, every supply forecast, every macro call built on AIS throughput numbers works from a dataset that systematically overstates disruption. When the data itself is gamed, AI trained on it amplifies the error.
I dealt with a version of this in fraud detection at Patelco. We deployed a behavioral biometrics rule through BioCatch that flagged account takeovers based on typing cadence and mouse movement anomalies. It worked. Online account takeovers dropped 20%. Then the fraudsters adapted. They started using remote access tools that mimicked normal user behavior. Within months, the very signals we'd trained on were being actively spoofed. We had to rebuild the detection logic from scratch, targeting the new behavior patterns. That cycle never ended. The ongoing work was recognizing when our model's assumptions had become part of the system it was measuring.
Peak Confidence Before Catastrophic Failure
Financial AI models are most confident right before they fail spectacularly. They excel at identifying patterns but cannot distinguish correlation from causation. In reflexive systems where misleading patterns are common, that gap becomes a structural vulnerability.
Academic research confirms this at scale. AI systems tend to make decisions associated with higher expected returns but also with higher risks compared to humans. The risky decisions sometimes produce catastrophic errors. The confidence score goes up as the model approaches exactly the kind of regime shift that will destroy it.
Stanford GSB research on AI crisis prediction identifies the core limitation: AI models might pinpoint where financial trouble is brewing, but they can't explain why it's happening or whether a specific intervention would fix it. Worse, the models create moral hazard by encouraging institutions to take on new risks under the assumption that regulators will intervene. The 2023 regional banking crisis illustrated this perfectly. Models trained on pre-SVB data associated certain bank balance sheet characteristics with stability. Those correlations proved worthless when depositors at Silicon Valley Bank moved faster than any model predicted. Interest rate relationships from one regime rarely survive into the next.
The deepest failure mode is correlation collapse. Models assume diversification limits risk because different positions are only weakly correlated historically. In extreme stress, correlations converge toward one. LTCM demonstrated this with $4.6 billion in losses from positions that were, on paper, diversified across currencies, bonds, and equity derivatives. When Russia defaulted in August 1998, every position moved against them simultaneously.
The lesson is exact: leverage combined with illiquidity kills precisely because capital is only as patient as its least patient provider. Lenders lose patience at the exact moment funds need them to maintain it. Modern AI systems are making the same structural bet LTCM made. They treat historical correlations as stable parameters rather than regime-dependent variables.
Large language models compound the problem. As Brian Pisaneschi of the CFA Institute writes, LLMs select words based on probability and continue incorrect reasoning confidently. They operate like Kahneman's System 1 thinking: automatic, pattern-matching, fast. Financial markets require System 2: deliberate, effortful reasoning that questions its own assumptions. In a domain where being wrong at the worst moment can destroy a firm, System 1 thinking is the last thing you want running the show.
What 90% Excel Usage Actually Tells You
The CFA Institute's position, articulated across multiple publications in 2025 and 2026, is clear. Rhodri Preece, CFA, frames the future as AI + HI: the complementary cognitive capabilities of humans and machines. Professional judgment remains essential to test assumptions, validate data sources, and maintain accountability.
The adoption numbers tell the real story. From a 2024 CFA Institute survey of investment professionals:
- 90% still use Excel for valuation work
- 20% use Python alongside Excel
- 16% use GenAI for industry or company analysis
- 27% use GenAI to assist in preparing research reports (highest adoption rate)
Meanwhile, 55% of financial services companies are actively exploring GenAI workflows, according to an NVIDIA survey. The gap between exploration and production is wide.
I hear the counterargument: adoption is slow because finance is conservative. That's partly true. But it misses something. The 90% Excel number persists because spreadsheets let analysts see every assumption, trace every calculation, and explain every output to a regulator or a client. That transparency is a feature of the workflow, not a bug. When you swap it for a model that can't explain its reasoning, you gain speed and lose the one thing your stakeholders actually pay for: the ability to say why.
A CQF Institute survey found that 76% of finance professionals believe their academic training didn't adequately prepare them for AI skills, and 88% believe a skills gap exists. The gap is real. But it's a gap in technical AI competence, not in judgment. Those are different capabilities, and conflating them is where the automation narrative breaks down.
The Governance Gap Banks Can't Close Fast Enough
The numbers on governance tell a clear story of capability outrunning control. 80% of large financial institutions now use AI in core decision-making, according to the Bank for International Settlements. Firms with structured human-in-the-loop processes reduced model-related incidents by nearly 40% compared to fully automated systems.
The CFA Institute identifies where legacy bank frameworks fall short. Start with explainability: why did the model make this decision? Then accountability: who owns the output when it's wrong? And the one banks keep deferring: who monitors the model after deployment, month over month, as the world it was trained on drifts away from the world it's operating in?
IBM's 2025 Cost of a Data Breach Report found that 97% of breached organizations with AI-related incidents lacked controls governing internal AI use. 63% reported not having an AI governance policy at all.
The foundational U.S. model risk management guidance, SR 11-7, was built for a different world. It assumed models are simplified, relatively static representations with bounded scope and stable parameters. Krishan Sharma at GARP argues these assumptions fail under agentic AI. Modern AI systems are dynamic, probabilistic, and autonomous, violating every premise SR 11-7 was built on.
Sharma identifies where the framework needs to evolve. Traditional validation testing loses effectiveness when models recalibrate autonomously. The reliance on a small number of external AI vendors creates correlated risks across institutions. And SR 11-7 provides limited guidance on what constitutes sufficient explainability for a system that cannot fully explain its own reasoning. The policy recommendation is evolution, not abandonment. But the evolution hasn't happened yet.
I watched this governance gap firsthand at Patelco during incident recovery. Building the enterprise Fraud Data Mart meant reconciling what the models predicted with what the regulators needed documented. The models were fast. The documentation requirements were not optional. Every automated decision required an audit trail that a non-technical examiner could follow. That constraint didn't slow us down because we were conservative. It slowed us down because the regulators aren't wrong to demand it.
In April 2026, Treasury Secretary Bessent and Fed Chair Powell summoned bank CEOs to an urgent meeting over AI-driven cybersecurity risks following Anthropic's Mythos model release. Germany's BaFin now classifies AI as part of ICT risk management under DORA. The U.S. Treasury released its Financial Services AI Risk Management Framework with a shared AI lexicon because inconsistent terminology was creating real governance problems.
The signal is clear. More human oversight, not less, is what reduces AI risk in financial institutions.
The Work That Stays Human
Sid Ratna, Head of Digital and Analytics at Vanguard, identifies what remains beyond AI's reach: behavioral coaching during market volatility, goal discovery, life planning, and emotional reassurance. AI does not feel anything and cannot provide genuine empathy. The heart of financial advice has never been about optimizing a portfolio.
A private credit practitioner put the deal-level version bluntly: AI can read a deal in minutes. That doesn't make someone a deal professional. It doesn't automatically know who to call, how to position the file, what a lender actually cares about, which red flag kills the deal, which one can be worked around, or when a sponsor is real. That's judgment, pattern recognition, and execution built over years.
This lands for me. When I was underwriting commercial loans at a community bank, the analysis was maybe 30% of the work. The rest was relationship context. Knowing which borrowers had weathered downturns before. Knowing which financial statements told a different story than the numbers showed. Knowing when a guarantor's verbal commitment was worth something and when it wasn't. No dataset captures that. It accumulates through reps.
The Keynesian beauty contest is the right frame for understanding where the edge moves. Keynes argued that successful investing requires predicting what others will predict, not analyzing fundamentals in isolation. When everyone uses the same AI to analyze the same fundamentals, the AI-generated consensus becomes the new noise. The edge shifts to whatever the AI cannot do: relationship context, regime change detection, and judgment under genuine ambiguity.
As @BoringBiz_ argued (175K followers): there's incredible alpha today in just reading a piece of text from end to end. Too many analysts at large funds use AI to summarize 10-Ks. The answers AI provides are the same answers it gives to everyone else. The only alpha left is in reading by yourself.
That's the Keynesian folly in one sentence. The tool that was supposed to create advantage becomes the thing that destroys it, precisely because everyone adopts it.
Skills That Appreciate (and Depreciate) in AI-Driven Finance
Three capabilities appreciate as AI handles the execution layer.
Judgment under uncertainty. The ability to make decisions when data is ambiguous, incomplete, or contradictory. This is the skill machines provably lack in reflexive systems. If you're early in your career, seek out roles where you have to make calls with imperfect information. That muscle is harder to build later.
Institutional and relationship knowledge. Who to call, which red flags matter, how to position a deal, what a regulator actually cares about versus what they say they care about. This accumulated context can't be serialized into training data. It comes from years of being in the room.
AI governance competence. Model drift monitoring, output validation, knowing when to override. Knowing that an AI model produced a forecast is not enough. Finance teams need to understand model assumptions, training data vintage, accuracy benchmarks, and when the model's world no longer matches the real one. JPMorgan has mandated GenAI training for every new employee since 2024. Citigroup began upskilling most of its workforce in prompt engineering in September 2025. The institutions are treating AI literacy as table stakes.
What depreciates: routine data gathering, report formatting, standard compliance checks, anything where the value comes from processing speed rather than judgment. If your job is running a model and reporting what it says, that job is compressing fast.
The CFA designation retains relevance precisely because it tests judgment, ethics, and analytical reasoning under uncertainty. Those are the capabilities AI can't replicate in reflexive systems. The credential's value shifts from signaling analytical competence (which AI can now approximate) to signaling the judgment layer that sits on top of analysis. The most powerful career move is combining that domain expertise with technical AI skills. Python and SQL are transitioning from nice-to-have to necessary.
I hold the CFA charter. I also write Python daily and fine-tune my own models. That combination used to be unusual. In five years, it'll be the baseline expectation for anyone who wants to do serious analytical work in finance. The question is whether you're building both muscles now or waiting until the job description forces your hand.
The Bottom Line
Financial markets are reflexive. AI models degrade the patterns they exploit. This is a mathematical property of competitive systems, and better models don't fix it. They accelerate the decay.
The work doesn't disappear. It migrates. Analysts stop gathering data and start questioning whether the data still means what the model thinks it means. Portfolio managers stop running scenarios and start interrogating whether the model's regime assumptions survived last quarter. The execution layer compresses. The interpretation layer expands.
The Keynesian folly is believing automation solves a problem that is, by its nature, adversarial. Every edge gets competed away. Every signal gets crowded. Every model gets arbitraged. The human layer persists because someone has to do the work that starts after the model finishes. That's where the value concentrates.