Open Banker
Posts
How Actionability Can Build Trust in Generative AI

How Actionability Can Build Trust in Generative AI

Written by Michael Hsu

Open Banker
October 14, 2025

Michael Hsu served as Acting Comptroller of the Currency from May 2021 to February 2025. Currently, he is a fellow at the Aspen Institute, member of the Bretton Woods Committee, and advisor to central banks, companies, and non-profits.

Open Banker curates and shares policy perspectives in the evolving landscape of financial services for free.

Generative AI holds great promise to make banking and finance better, fairer, more inclusive, and more efficient. Regulatory uncertainty is a barrier to adoption, however, especially with regards to model risk management (MRM), e.g., SR 11-7. Regulators should take an outcomes-based approach and focus on actionability over interpretability for the MRM of generative AI.^{1 2}

Square Peg, Round Hole

With conventional models, interpretability informs what needs to be fixed when something goes wrong. This, in turn, enables trust.

For instance, when a bank's valuation or risk model fails, risk managers can trace the problem to a specific component — say, an incorrect trading position or mis-calibrated parameter. They can be decomposed like mechanical watches, with broken pieces exposed, identified, and then fixed.

Generative AI models are different. They are black boxes that cannot be interpreted in ways that inform how to fix them, at least not yet. While AI interpretability has made great strides, the science is still new and the methods aren’t scalable.

Regulators should keep their eyes on the prize – enabling trust – instead of reflexively applying familiar, but ill-fitting approaches that will needlessly frustrate businesses and undermine regulator credibility. An outcomes-based approach — one that focuses on actionability when things go wrong — is more practical and more likely to build trust in gen AI deployments.

The Black Box Problem

Large language models (LLMs) don't work like traditional statistical models. They are powered by neural networks with billions of parameters that interact in ways that are extremely difficult to trace. Dario Amodei, the CEO of Anthropic, recently estimated that it will be five years before we have a true “MRI for AI.”³

In addition, banks, like most enterprises, aren’t deploying LLMs by themselves. They are building AI systems that integrate LLMs with retrieval-augmented generation, guardrails, and external tools, enabling them to be more responsive to user queries and even to take actions on their own. In these systems, the LLM and its interpretability is just one of many components impacting outcomes.

Outcomes (Actionability) Over Process (Interpretability)

To build trust in AI systems, we don't necessarily need to understand how the underlying LLMs work. We just need to ensure that when they fail, we can diagnose why and fix it. In other words, we need to be able to see enough (observability) to take the right action when things go wrong.

Three techniques show particular promise:

Chain-of-Thought (CoT) reasoning makes an AI model’s intermediate reasoning steps visible. Instead of just receiving an answer, users can see the set of logical steps associated with an AI model’s response. If the model response were to be wrong, a reviewer could in theory find the reasoning step that was faulty and devise a fix so it wouldn’t happen again.

Top AI researchers have cited CoT monitoring as a key risk management tool. But they also caution that AI model reasoning traces may not reflect their actual computations. Some research has found models fabricating plausible-sounding reasoning to fit predetermined answers.⁴

Chain-of-Prompts (CoP) scaffolding offers users more direct control by structuring AI tasks as sequences of predefined prompts. Instead of asking an AI model to provide a response or carry out a task with a single query (“one shot”), users break it down into a set of discrete micro-prompts — each designed to produce specific, verifiable outputs.

Consider, for instance, anti-money laundering (AML) investigations, which are highly resource intensive for banks and often result in false positives. Rather than asking an AI, “Is this transaction suspicious?” a CoP workflow might include a chain of micro-prompts:

Prompt 1: “You are a Bank Secrecy Act (BSA)/AML investigator. You must follow 31 CFR 1020.320, FinCen SAR narrative guidance, [bank policies …], and … Cite rule numbers and policies when making evaluations and recommendations…”

Prompt 2: [Upload transaction information] “Summarize the transaction and identify all elements using the proper format consistent with bank policy […]. Double check to make sure the formatting is correct.”

Prompt 3: “Use this red flax matrix [link] and label each flag TRUE/FALSE. Double check the sanctions list [link] and adverse media.“

Prompt 4: “Generate a risk probability score (0-100). List top 5 drivers with weight percentages. Explainable format only.”

Prompt 5: “Given [transaction_id] fetch linked entities [customer_id, device_id…] and summarize unusual patterns.”

Prompt 6: “Propose <[5] possible explanations (legit or illicit) ranked by likelihood; cite which flags support”

Prompt 7: “List any missing Know Your Customer (KYC)/Enhanced Due Diligence (EDD) elements that materially affect the decision”

Prompt 8: “Write a 1-2 page narrative in third person covering who, what, where, when, why, how, in chronological order in under [5,000] words. For cases with [risk score of ….] provide counterfactual analysis in under [2,000] words.”

Prompt 9: “Validate that the narrative includes all essential SAR elements, references UTC dates, omits personal identifiable information (PII). Return PASS or list fixes.”

Prompt 10: “Based on the risk score, flags, narrative, and bank policy thresholds, provide a recommendation of File SAR, No SAR, or Escalate, plus rationale with clear mapping to supporting data, analysis, and CFR citations…”

This approach offers several advantages. The prompts are designed and controlled by compliance teams, not generated unpredictably by the AI during inference. Each step’s output can be validated. When errors occur, reviewers can pinpoint which prompt produced the problem and revise it accordingly. The system becomes auditable, improvable, and trustworthy.

Combining CoP scaffolding with CoT reasoning provides even greater diagnostic power — similar to “layered chain-of-thought” approaches used in medical contexts, where transparency and intermediate verifiability are critical.

Agentic tracing extends these principles to multi-agent systems. As banks experiment with systems of AI models that can autonomously execute tasks, coordinate across multiple agents, and interact with external tools, the complexity multiplies. Agentic tracing systematically captures decision pathways, intermediate states, and interactions across agents and tools involved in completing a task.

The goal remains the same: see enough to act when things go wrong. Link errors to specific components, decisions, or inputs for targeted remediation.

These techniques have limits. CoT reasoning can be unfaithful. CoP workflows are vulnerable to prompt injection attacks and can become brittle at scale. Agentic tracing faces challenges with trace incompleteness, scaling complexity, and cross-agent contamination.

But they represent a more practical foundation for risk managing AI than waiting for interpretability breakthroughs.

The Path Forward

Regulators can promote responsible gen AI adoption by banks by clarifying that they don’t need to unpack the black box. Instead, MRM supervisory expectations for generative AI should be updated to encourage sufficient observability of AI decision-making processes to enable effective oversight and remediation when failures occur.

It will be some time before we will be able to fully understand how gen AI models work at a neural level (interpretability). In the meantime, trust in AI systems can be built by focusing on outcomes (actionability). As AI capabilities continue to rapidly expand, actionability through observability offers the most viable path to responsible deployment at scale.

“AI Actionability Over Interpretability” by Michael Hsu is one of three winning papers selected for DC Fintech Week on 14-17 October (see https://dcfintechweek.org/). DC Fintech Week, now in its 9th year, represents a premier convening of leaders in industry, policymaking, academia, and public commentary, to discuss critical topics in financial technology, digital assets, artificial intelligence, and standard setting.

Another winning paper is “Digital Payment Innovations in Sub-Saharan Africa” by Luca Ricci and co-authors. They take stock of developments and policy issues related to digital payments innovations across sub-Saharan African, including central bank digital currency, fast payments systems, mobile money and crypto-assets. The final winning paper is “Making Stablecoins Stable(r): Can Regulation Help?” by Ulf Lewrick, Tirupam Goel and Isha Agarwal. This article discusses regulatory responses to risks from stablecoins, showing that liquidity requirements are more effective in curbing market spillovers, while capital requirements are better suited to lowering default probabilities.

The winning papers and additional 11 featured research papers are available online at https://dcfintechweek.org/featured-research-2025/. The Selection Committee consists of Chris Brummer (Georgetown University), Yesha Yadav (Vanderbilt University), Ilhyock Shim (Bank for International Settlements (BIS)), Jon Frost (BIS), Nydia Remolina (Singapore Management University), Zainab Ahmed (The Fintech Foundation), and Elise Soucie Watts (Global Digital Finance).

Special thanks to Jon Frost, Yesha Yadav, and Ilhyock Shim for their work towards this event and this write-up.

The opinions shared in this article are the author’s own and do not reflect the views of any organization they are affiliated with.

^[1] Mike Hsu, “AI Actionability Over Interpretability” (September 2025).

^[2] Mike Hsu, AI and Category Errors

^[3] Dario Amodei, “The Urgency of Interpretability” (April 2025).

^[4] Korbak et al, “Chain of Thought Monitorability” (July 2025).

Open Banker curates and shares policy perspectives in the evolving landscape of financial services for free.

If an idea matters, you’ll find it here. If you find an idea here, it matters.

Interested in contributing to Open Banker? Send us an email at [email protected].