AI Governance in Insurance: What the NAIC AI Bulletin Means for Claims Models

Dr Faiz Rasool
May 18, 2026
10 mins Read

Forty-four states have now adopted or are actively considering the NAIC Model Bulletin on the Use of Artificial Intelligence Systems by Insurers. For the US insurance industry, that is not a gradual trend it is a regulatory wave. Claims operations sit squarely in the center of it.

Insurance AI is not new. Carriers have used predictive scoring, fraud detection algorithms, and automated triage tools in claims for years. What is new is the accountability structure regulators expect to see around those systems. The NAIC AI Bulletin, adopted by the full NAIC membership in 2023, changed the question from ‘Are you using AI?’ to ‘Can you prove your AI is fair, accurate, and governed?’ Those are very different obligations.

This article examines what the Bulletin actually requires, why claims models bear the heaviest scrutiny, which states have moved from voluntary guidance to enforceable rules, and what a defensible AI governance program looks like in practice for US insurers.

What the NAIC 2023 Model Bulletin Actually Says

The NAIC Model Bulletin on the Use of Artificial Intelligence Systems by Insurers was formally adopted in December 2023. It is not a model law or model regulation a distinction that matters enormously. As a bulletin, it represents a statement of regulatory expectations that states may issue to their domestic carriers without legislative action. That means adoption is faster than rulemaking, and enforcement follows existing market conduct authority.

The Bulletin centers on three core obligations for insurers that use third-party AI systems or develop proprietary ones.

1. Governance and Accountability

Insurers must maintain a written AI governance program proportionate to the complexity and risk of the AI systems they use. This program must document who owns AI oversight at the executive level, how AI vendors are selected and monitored, and how AI-driven decisions are reviewed when consumers raise concerns. The Bulletin explicitly states that using a vendor’s AI system does not transfer regulatory accountability to that vendor. The insurer remains responsible.

2. Risk Management

Carriers must assess AI systems for potential consumer harms — including disparate impact on protected classes before deployment and on an ongoing basis. The Bulletin specifically names risks related to unfair trade practices and unfair discrimination as the primary compliance lens. This is deliberate: by framing AI risk within existing unfair trade practices statutes, regulators can enforce the Bulletin’s expectations using current authority, without waiting for new AI-specific legislation.

3. Third-Party AI Oversight

Because most insurers do not build AI from scratch, the Bulletin places significant emphasis on vendor due diligence. Insurers must understand how AI systems they license were built, what data was used to train them, how bias testing was conducted, and what documentation exists on model performance and limitations. ‘We bought it from a vendor’ is not a compliant answer to a regulatory exam question.

Key distinction: The NAIC Bulletin does not set technical standards for what a ‘fair’ AI system looks like. It sets process standards governance, documentation, oversight, and testing protocols that regulators can examine.

Insurance carriers can learn from banking’s longer history of model governance. Banks have already built mature practices around model inventory, validation, monitoring, documentation, and examiner-ready controls under SR 11-7 and OCC expectations. Those same governance concepts are increasingly relevant to insurers building claims, pricing, fraud, and underwriting models, making AI governance in banking and financial services a useful reference point.

Why Claims Models Face the Highest Scrutiny

Not all AI use cases in insurance carry equal regulatory risk. A chatbot that answers billing questions is categorically different from an algorithm that recommends denial of a medical claim or calculates a property damage settlement. Regulators know this, and their examination priorities reflect it.

Claims AI operates in three high-risk zones.

Coverage and Benefit Determination

AI systems that inform or automate decisions about whether a claim is covered and to what extent directly affect consumer financial outcomes. A model that systematically undervalues claims in certain zip codes, assigns lower injury severity scores to older claimants, or flags claims for fraud review at higher rates in minority-majority census tracts creates exactly the kind of discriminatory harm unfair trade practices laws were designed to prevent. The harm is not theoretical: multiple state attorneys general investigations in the early 2020s surfaced AI-assisted claims handling practices that produced racially disparate outcomes.

Subrogation and Recovery Prioritization

AI tools that rank claims for subrogation pursuit deciding which recoveries to chase aggressively and which to abandon may embed socioeconomic proxies that correlate with protected characteristics. When a model learns that certain claimant profiles are less likely to contest a recovery action, it may deprioritize recovery against those groups while aggressively pursuing others, creating compliance exposure under both unfair trade practices and consumer protection frameworks.

Fraud Detection and SIU Referral

Automated fraud scoring models that refer claims to Special Investigations Units represent perhaps the most scrutinized AI application in insurance. A false positive referring a legitimate claim for fraud investigation imposes significant harm on policyholders: delayed payment, invasive inquiry, and reputational damage. When fraud models generate higher false-positive rates for certain demographic groups, those groups bear a disproportionate compliance and claims experience burden.

The NAIC Bulletin’s risk framework maps directly onto these categories. Documentation, testing, and human override protocols for claims AI are not compliance overhead they are the difference between a defensible governance program and a market conduct finding.

Claims AI Use Case	Primary Risk Type	Regulatory Exposure	NAIC Bulletin Priority
Coverage/Benefit Determination	Disparate Impact	Unfair Trade Practices	High
Fraud Scoring / SIU Referral	False Positives by Demographic	Consumer Protection	High
Subrogation Prioritization	Socioeconomic Proxy Bias	Unfair Discrimination	Medium-High
Reserve Setting	Systematic Underpayment	Market Conduct	Medium
Claims Triage / Routing	Differential Service Quality	Unfair Trade Practices	Medium

How the Bulletin Defines ‘AI System’ and What That Means for Governance

One of the most practically important questions for compliance officers is definitional: what exactly must be governed? The NAIC Bulletin adopts a broad definition aligned with the OECD’s framework, capturing any machine-based system that can make predictions, recommendations, or decisions influencing real or virtual environments, where such outputs are used in insurance transactions.

Under this definition, the following all qualify as AI systems requiring governance documentation: gradient boosting models used in claims scoring, rule-based expert systems augmented with machine learning outputs, third-party vendor platforms with embedded AI features, and robotic process automation tools that incorporate predictive elements. Simple rule-based systems without adaptive or predictive components may fall outside the definition, but carriers should document that assessment explicitly.

The Vendor Accountability Gap

The Bulletin’s treatment of third-party AI vendors creates the single biggest operational challenge for most carriers. Insurers that license AI platforms from insurtech vendors, analytics firms, or legacy software providers often lack visibility into model architecture, training data, or bias testing methodology. The Bulletin’s expectation is that carriers obtain, review, and retain this information.

In practice, this means contract language with AI vendors must now require documentation packages that include: training data provenance and demographic representativeness, fairness metric results across protected class proxies, model performance degradation policies and retraining schedules, and incident notification obligations when model drift is detected. Carriers that cannot obtain this information from vendors face a binary choice: negotiate for it or find a different vendor. Regulators have made clear that ‘the vendor wouldn’t share it’ is not a compliance defense.

State Adoption Map: Where the Bulletin Has Teeth

The NAIC Bulletin is a model document its force of law depends entirely on whether state insurance commissioners issue it to their domestic carriers, and whether those states have enforcement mechanisms to back it up. The landscape as of 2025 is uneven but rapidly consolidating.

States with Active AI Bulletins or Regulatory Guidance

Colorado was the first state to move from guidance to regulation with Senate Bill 21-169, which established bias testing requirements for insurance AI systems with a specific focus on life insurance. California, New York, and Connecticut have issued formal regulatory guidance with market conduct implications. Illinois has included AI governance in its examination protocols. Washington State’s Office of the Insurance Commissioner issued a bulletin closely mirroring the NAIC model.

States with Pending Legislation

As of early 2025, over 30 states have AI-related insurance bills in various stages of legislative consideration. The pattern is convergent: most pending state legislation follows the NAIC Bulletin’s accountability-and-documentation framework rather than prescribing technical standards. The majority also explicitly address the vendor accountability gap.

The Federal Dimension

The CFPB’s adverse action notice requirements under ECOA and Regulation B create a federal overlay for any AI-assisted credit-related insurance decisions. The FTC’s Section 5 authority over unfair or deceptive acts applies to consumer-facing AI. HHS AI guidance affects health insurance claims AI. Carriers operating nationally face a patchwork of overlapping authority which is precisely why a governance program designed to satisfy the most demanding state standard also tends to satisfy federal requirements.

Compliance reality: A carrier whose AI governance program meets Colorado’s technical standards and the NAIC Bulletin’s documentation expectations is well-positioned in any US state, even those that have not yet issued their own bulletin.

State	Status	Key Requirement	Enforcement Mechanism
Colorado	Enacted (SB 21-169)	Algorithmic impact assessments, bias testing	DOI market conduct authority
California	Regulatory guidance issued	Vendor oversight, documentation	CDI market conduct exams
New York	Guidance + exam protocols	Risk management program	DFS examination authority
Connecticut	Bulletin issued	AI governance program, testing	CID market conduct
Washington State	Bulletin issued (NAIC model)	Documentation, vendor diligence	OIC exam authority
Illinois	Exam protocol update	AI use disclosure, oversight	IDOI exam procedures
30+ other states	Legislation pending	Varies by bill	Varies by state

Governance Framework Requirements: What Examiners Will Look For

When a market conduct examiner reviews an insurer’s AI governance program, they are not running a technical audit of model architecture. They are looking for a documented process that demonstrates the carrier understands its AI systems, manages their risks, and maintains human accountability for AI-influenced decisions. The following components are consistent across the NAIC Bulletin’s expectations and state-level exam protocols.

Written AI Governance Policy

The foundation is a board-approved or senior executive-approved written policy that defines what constitutes an AI system at the carrier, establishes ownership and accountability structures, sets risk tolerance thresholds for AI deployment, and specifies review and approval protocols before new AI systems go into production. Without a written policy, every subsequent governance activity lacks a documented mandate.

AI System Inventory

Carriers must maintain a comprehensive inventory of all AI systems in use, including vendor platforms with embedded AI features. The inventory should capture: system name and vendor, intended use case and decision domain, data inputs and outputs, last bias/fairness assessment date, and oversight controls in place. An inventory that cannot be produced within 72 hours of a regulatory request is functionally non-compliant.

Model Risk Management Integration

For carriers with established model risk management frameworks common among carriers with actuarial modeling sophistication integrating AI governance into existing MRM structures is both efficient and defensible. Validation standards, challenger model requirements, and performance monitoring protocols developed for predictive models translate directly to AI governance requirements. The key addition is the fairness and disparate impact testing layer that traditional MRM may not have addressed explicitly.

Consumer Complaint and Override Procedures

The Bulletin requires that carriers maintain procedures for consumers to inquire about, contest, and obtain human review of AI-influenced decisions. For claims operations specifically, this means AI-assisted coverage denials, settlement valuations, or fraud referrals must be accompanied by intelligible explanations and a documented path to human review. This is where explainability requirements become operational, not just technical.

Bias Testing and Fairness in Claims AI: What ‘Testing’ Actually Means

The word ‘testing’ appears throughout the NAIC Bulletin and state AI guidance, but rarely with technical specificity. That ambiguity is both a challenge and an opportunity carriers that define their own testing methodology rigorously, document it transparently, and apply it consistently are in a stronger compliance position than those waiting for regulators to prescribe exact methods.

Protected Class Proxy Analysis

Direct use of protected characteristics in insurance AI is prohibited. But AI systems trained on large datasets frequently develop proxy variables that correlate with race, national origin, gender, or disability status zip code being the most cited example. Rigorous bias testing must examine whether model inputs that are facially neutral produce outcomes that are statistically different across demographic groups. This is disparate impact analysis applied to AI, and it requires carriers to have access to demographic data they may not routinely collect or retain.

Fairness Metrics: Choosing the Right Measure

There is no single ‘correct’ fairness metric, and the choice of metric shapes what a model looks like. Demographic parity asks whether outcomes occur at equal rates across groups. Equalized odds asks whether true positive and false positive rates are equal across groups. Calibration asks whether predicted probabilities align with actual outcomes equally across groups. Each metric captures a different dimension of fairness, and mathematically, optimizing for one often trades off against another. Carriers must document which metrics they use, why those metrics are appropriate for the specific use case, and what thresholds constitute acceptable performance.

Ongoing Monitoring vs. Point-in-Time Testing

A bias test conducted at model deployment is necessary but insufficient. Model drift — where model performance degrades over time as the world changes and the training data becomes stale is a known phenomenon in machine learning. For claims AI specifically, shifts in claim type, claim value distributions, fraud patterns, and claimant demographics can all cause a model to behave differently than it did when validated. Carriers need ongoing monitoring protocols that trigger re-validation when drift thresholds are exceeded, not just annual reviews.

Human Oversight and Explainability: The Accountability Layer

The NAIC Bulletin’s human oversight requirement is grounded in a straightforward regulatory philosophy: AI can assist human decisions, but humans must remain accountable for them. For claims operations, this means AI systems cannot be the sole or final decision-maker on material coverage or benefit determinations at least not without a documented human review point in the workflow.

Explainability in Practice

Explainability is not primarily a technical challenge it is a communication and documentation challenge. Regulators and consumers do not need to understand gradient boosting or neural network architecture. They need to understand, in plain language, what factors drove a particular AI-assisted decision and how those factors can be addressed. SHAP values and LIME explanations are internal validation tools; what matters for compliance is whether claims professionals can articulate the basis for an AI-assisted decision in a consumer complaint response or a deposition.

Human Review Thresholds

Leading carriers are implementing tiered review architectures: AI handles routine, low-risk claims decisions autonomously, with human escalation triggered by confidence score thresholds, anomaly flags, or claim characteristics that correlate with elevated bias risk. This approach preserves the efficiency benefits of AI automation while maintaining the human accountability the Bulletin requires. The escalation criteria and thresholds must be documented and defensible.

Appeals and Contestation Rights

Any AI-assisted denial, reduction, or adverse determination must come with a clear consumer pathway to request human review. For health insurance claims, this overlaps with existing state and federal appeals requirements. For property and casualty claims, it is increasingly a standalone AI governance obligation. The documentation trail from initial AI decision to consumer notice to human review to final determination must be complete and retrievable.

Building an Internal AI Governance Program: A Practical Framework

The gap between understanding what regulators expect and operationalizing those expectations is where most carriers currently sit. The following framework reflects the components consistently present in AI governance programs that have survived market conduct scrutiny.

Governance Structure

Effective AI governance requires cross-functional ownership. Technology teams understand model architecture; compliance teams understand regulatory requirements; actuarial teams understand model validation; claims leadership understands operational impact. A governance committee that includes all four functions with clear executive sponsorship and board-level reporting for material AI risks creates accountability that functional silos cannot.

Pre-Deployment Review

Before any new AI system or significant model update goes into production in claims operations, a documented pre-deployment review should assess: the legal basis for data use, proxy variable analysis, fairness metric results, human oversight design, consumer disclosure obligations, and incident response procedures. This review should be conducted by parties independent of the team that built or procured the model.

Vendor Contract Requirements

Every AI vendor contract should include: documentation delivery obligations (model cards, data provenance records, fairness testing results), notification requirements for material model changes or detected drift, audit rights allowing the carrier to conduct or commission independent fairness assessments, data retention and deletion requirements, and representations regarding compliance with applicable insurance AI regulations. Carriers that have not yet updated vendor contracts to include these provisions are carrying unquantified regulatory risk.

Incident Response

When an AI system produces outcomes that trigger a consumer complaint, a regulatory inquiry, or an internal audit finding, the incident response protocol determines whether the event becomes a market conduct finding or a demonstration of governance maturity. Carriers should have documented procedures for quarantining affected AI-assisted decisions, conducting root cause analysis, notifying regulators where required, and remediating affected consumers.

Regulatory Examination Readiness and Enforcement

Market conduct examinations are the primary enforcement mechanism for AI governance in insurance. Unlike financial condition exams, which focus on solvency, market conduct exams evaluate how carriers treat consumers in their day-to-day operations including how AI systems affect claims handling, underwriting, and service delivery.

What Examiners Are Actually Requesting

Based on examination protocols published by Colorado, New York, and the NAIC Market Regulation Handbook, examiners reviewing AI governance are requesting: written AI governance policies, AI system inventories with supporting documentation, pre-deployment review records for claims AI systems, bias and fairness testing documentation, consumer complaint data segmented by AI-influenced decisions, vendor contracts and due diligence records, and training records for claims personnel on AI oversight obligations.

The Documentation Standard

In regulatory examinations, if it is not documented, it did not happen. A carrier may have the most sophisticated internal AI review process in the industry, but if that process is not documented in a way that survives examiner review, it provides no regulatory protection. The documentation standard for AI governance is the same as for any other compliance-sensitive process: contemporaneous, complete, and retrievable.

Enforcement Outcomes

Current enforcement actions related to AI governance in insurance have focused on two scenarios: carriers that used AI systems producing disparate impact outcomes without documented bias testing, and carriers that failed to maintain human oversight procedures for AI-assisted adverse determinations. Remediation has included consent orders requiring governance program development, consumer restitution for affected claimants, and in some cases civil monetary penalties. The financial exposure is not hypothetical it is precedented.

From Bulletin to Binding Regulation: The Regulatory Trajectory

The NAIC Bulletin represents a transitional moment, not an endpoint. The regulatory trajectory is toward binding rules, and carriers that treat the Bulletin’s expectations as aspirational rather than operative are misreading the direction of travel.

Colorado’s SB 21-169 demonstrated that state legislatures will move AI accountability requirements into statute when industry self-governance is perceived as insufficient. The CFPB’s algorithmic fairness agenda, the FTC’s enforcement posture on AI-related consumer harms, and the proliferation of state AI governance bills all point toward a more structured regulatory environment by 2026-2027.

The strategic response is not to wait for specific rules. It is to build a governance program that meets the standard the most rigorous current regulation requires, document it thoroughly, and operate it consistently. Carriers that do this now will not need to scramble when binding rules arrive they will simply need to demonstrate what they already do.

The insurers best positioned for the next phase of AI regulation are not those who know the most about machine learning. They are the ones who have built the most defensible governance structures around the AI systems they use.

Frequently Asked Questions

What is the NAIC AI Bulletin and is it legally binding?

The NAIC Model Bulletin on the Use of Artificial Intelligence Systems by Insurers is a model document adopted by the National Association of Insurance Commissioners in December 2023. It is not itself binding law. Its legal force depends on whether individual state insurance commissioners issue it to carriers domiciled in their states. States that have issued the bulletin or adopted equivalent guidance can enforce its expectations through existing market conduct authority.

Does the NAIC Bulletin apply to all AI use in insurance, or only specific functions?

The Bulletin covers any AI system used in insurance transactions that can make predictions, recommendations, or decisions affecting consumers. This includes claims scoring models, fraud detection algorithms, underwriting tools, and customer service automation with decision-influencing outputs. Simple rule-based systems without adaptive or predictive elements may fall outside the definition, but carriers should document that determination explicitly.

How do I assess whether our claims AI creates disparate impact?

Disparate impact assessment requires analyzing whether model outputs claim approvals, denial rates, settlement values, fraud referrals differ statistically across demographic groups defined by protected characteristics. Because direct use of protected characteristics is prohibited, the analysis focuses on proxy variables. Carriers typically need to use geographic and external demographic data to approximate group-level outcomes. The methodology should be documented and reviewed by independent parties.

What does ‘human oversight’ actually require in practice?

Human oversight does not require human review of every AI-assisted decision. It requires documented escalation pathways that ensure humans can review and override AI-influenced decisions when triggered by risk indicators, consumer requests, or confidence thresholds. For material adverse decisions claim denials, fraud referrals, significant underpayments human review points should be built into the workflow, not available only on request.

Are insurers liable for AI systems built by third-party vendors?

Under the NAIC Bulletin framework, yes. The Bulletin explicitly states that using a vendor’s AI system does not transfer regulatory accountability to the vendor. The insurer retains responsibility for ensuring the system complies with applicable insurance laws and for conducting appropriate due diligence on vendor AI. Contract language should require vendors to provide documentation necessary to satisfy this obligation.

Which states are most active in enforcing AI governance in insurance?

Colorado has enacted binding AI governance legislation (SB 21-169) with active enforcement authority. New York, California, and Connecticut have issued formal guidance with market conduct examination implications. Illinois has updated examination protocols to include AI governance reviews. Washington State has issued a bulletin mirroring the NAIC model. Carriers with nationwide operations should calibrate governance programs to the most demanding state standard.

What is the difference between the NAIC Bulletin and Colorado’s SB 21-169?

The NAIC Bulletin is a model document establishing expectations for governance process and documentation. Colorado SB 21-169 is enacted state law requiring insurers that use external consumer data and AI in life insurance to conduct algorithmic impact assessments, test for unfair discrimination, and file compliance reports with the Colorado Division of Insurance. SB 21-169 is more prescriptive in its technical requirements than the Bulletin.

How should insurers handle AI governance for legacy systems already in production?

Existing AI systems should be added to the carrier’s AI inventory and undergo retrospective governance review, prioritized by risk level. Claims models especially those used for coverage determination, fraud scoring, or settlement valuation should be prioritized for immediate bias and fairness assessment. Carriers should document these reviews even when they reveal gaps, as the documentation demonstrates good-faith governance effort rather than regulatory avoidance.

What to Do Next

The NAIC AI Bulletin represents the floor of what US insurance regulators now expect from carriers using AI in claims and other consumer-facing operations. The ceiling is rising as states move from guidance to binding rules. The carriers that will navigate this environment without market conduct findings or enforcement exposure are those that treat AI governance as a core compliance discipline today not as a future-state project.

Start with an honest inventory. Know which AI systems your organization uses, who owns them, and what documentation exists. Then assess your highest-risk applications claims scoring, fraud detection, coverage determination against the Bulletin’s governance, oversight, and bias testing expectations. The gaps you find are not problems; they are a prioritized workplan.

GAICC’s AI governance certification programs equip insurance compliance professionals with the frameworks, tools, and credentials to build defensible AI governance programs aligned with NAIC requirements and state regulatory expectations. Explore the ISO/IEC 42001 curriculum to see how international AI management system standards complement the NAIC regulatory framework.

Share it :

About the Author

A globally certified instructor in ISO/IEC, PMI®, TOGAF®, SAFe®, and Scrum.org disciplines. With over three years’ hands-on experience in ISO/IEC 42001 AI governance, he delivers training and consulting across New Zealand, Australia, Malaysia, the Philippines, and the UAE, combining high-end credentials with practical, real-world expertise and global reach.