Risk Scoring

The Customer Risk Model Nobody Can Explain

Why most institutions cannot answer the question 'why is this customer rated Medium?', and what that costs as AMLR Article 20 takes effect.

Mohan Paranthaman

Co-Founder · April 9, 2026 · 20 min read

Executive Summary

Every BaFin-supervised financial institution maintains a customer risk scoring model. The model assigns each customer a rating, generally Low, Medium or High, or some numerical equivalent, that determines the intensity of due diligence applied to the relationship, the cadence at which the relationship is reviewed, and the scope of transaction monitoring rules to which the customer is subject. It is the most consequential single data point in the compliance programme: it determines who is subject to Enhanced Due Diligence, whose unusual activity reaches the threshold for a Suspicious Activity Report, and which relationships are exited.

The point of this paper is that, in our experience across two decades of building and reviewing customer risk models for institutions ranging from global systemically important banks to newly licensed FinTechs, most institutions cannot explain how their model produces its scores. When an examiner asks why a particular customer is rated Medium rather than High, the expected answer is a traceable chain: which factors were considered, what weight each was given, what data populated each factor, where the threshold between Medium and High lies, and what governance process validated the model. What examiners typically receive instead is a spreadsheet with no documented rationale, equal weights across all factors, no validation history, and no evidence that the model has ever been recalibrated against the institution’s actual experience of the customers it has rated.

The remainder of this paper sets out the regulatory expectation, examines the five structural failures we observe most consistently in practice, sets out the recent enforcement record across BaFin, the FCA and DNB on customer risk model deficiencies, describes the characteristics of a defensible model in concrete terms, and addresses the AMLR Article 20 deadline of July 2027, which converts the explainability expectation from a supervisory finding into a directly applicable regulatory breach.

What's inside

Section	Chapter
Section 1	What Regulators Expect From a Customer Risk Model GwG §10–15, the EBA Guidelines on ML/TF Risk Factors, and FATF Recommendation 10. The five attributes that every framework expects.
Section 2	The Five Common Failures Equal-weight spreadsheets, undocumented rationale, no validation, no override audit trail, no recalibration cycle.
Section 3	The Enforcement Record Five recent BaFin, FCA and DNB cases, the model failure cited in each, and what an explainable model would have produced.
Section 4	What a Defensible Risk Model Looks Like Six characteristics, in concrete terms: factor selection, weighting, thresholds, distribution, override governance, examiner-ready output.
Section 5	AMLR Article 20 and the July 2027 Deadline Why the explainability expectation becomes a directly applicable regulatory breach in fifteen months.
Section 6	WBP Justifier How we built a customer risk scoring engine for the explainability requirement.

1. What Regulators Expect From a Customer Risk Model

GwG

The German framework

Sections 10–15 GwG. Due-diligence measures must be commensurate with the risk; EDD is mandatory for PEPs, correspondent banking and high-risk third countries. BaFin AuA: classification must be documented, factors institution-specific, methodology available on demand.

EBA

EBA/GL/2021/02

Customer, geographic, product, delivery-channel and transaction risk factors must be identified, weighted, and the weighting rationale itself documented. The weighting-of-risk-factors guidance is where the majority of institutions fall short.

FATF

Recommendation 10

Risk-based approach to CDD. Where risks are higher, enhanced measures are required. Institutions must demonstrate to supervisors that the extent of CDD measures is appropriate to the ML/TF risks they have identified.

1.1 The German framework: GwG §10–15

The German Geldwäschegesetz establishes the legal obligation for risk-based Customer Due Diligence. Section 10 GwG requires obliged entities to apply due-diligence measures commensurate with the risk posed by the business relationship. Section 14 GwG governs Simplified Due Diligence for lower-risk situations, and section 15 GwG mandates Enhanced Due Diligence for higher-risk relationships and specifies the categories in which enhanced measures are mandatory, namely politically exposed persons, correspondent banking relationships, and relationships involving high-risk third countries.

The phrase “commensurate with the risk” is the operative one. It presupposes a model capable of differentiating between risk levels and, by extension, an institution capable of explaining how that differentiation is performed. BaFin’s Auslegungs- und Anwendungshinweise make this expectation explicit: the risk classification must be documented, the factors must be institution-specific, and the methodology must be available for supervisory review on demand.

1.2 The EBA Guidelines on ML/TF Risk Factors

The European Banking Authority’s Guidelines on ML/TF Risk Factors, EBA/GL/2021/02, set out the most detailed supervisory expectation for customer risk models in the European Union. They require institutions to identify and assess the ML/TF risk associated with each individual business relationship, to consider customer, geographic, product and service, delivery-channel and transaction risk factors, to document the rationale for the weight assigned to each factor, to ensure the model produces results consistent with the institution’s business-wide risk assessment, and to validate and test the model on a regular basis.

The Guidelines’ provisions on the weighting of risk factors are where the majority of institutions fall short. The Guideline does not merely require that factors be weighted; it requires the weighting rationale itself to be documented and defensible. An institution that assigns equal weight to country of nationality and product type without explaining why those risks are equivalent in its specific business context has not, in any meaningful sense, complied with the Guideline.

1.3 FATF Recommendation 10 and the risk-based approach

At the global level, FATF Recommendation 10 establishes the risk-based approach to Customer Due Diligence. The Interpretive Note to Recommendation 10 requires that, where the risks are higher, institutions take enhanced measures to manage and mitigate those risks. The FATF Guidance on the Risk-Based Approach for the Banking Sector, published in 2014, requires institutions to be able to demonstrate to supervisors that the extent of their CDD measures is appropriate to the ML/TF risks they have identified.

2. The Five Common Failures

Across institutions of widely different size, business model and supervisory history, we observe the same five structural failures with striking consistency.

The equal-weight spreadsheet

Country, product, customer type, delivery channel and transaction volume each receive one point on a one-to-three scale. Equal weighting isn't inherently wrong. It's wrong when it's unexplained, which it usually is, because the spreadsheet was inherited from a consulting template and never revisited.

No documented factor rationale

Why does source of wealth carry a weight of three when industry sector carries a weight of two? These are not arbitrary design choices. When the choices are undocumented, the model cannot survive supervisory scrutiny.

No model validation

What proportion of SARs filed in the most recent year originated from customers rated Low at the time of filing? Most institutions have never asked. If the proportion is high, the model is systematically under-rating. If it is zero, the validation may be circular and meaningless.

No override audit trail

Overrides are legitimate. The failure is the absence of a record: who overrode the model, when, with what justification, and on whose authority. In most institutions the override exists only as an edited cell in a spreadsheet.

No recalibration cycle

A model built in 2019 was built for the institution's 2019 customer base, products, geography and risk environment. If it has not been recalibrated, by 2026 it is assessing 2026 risk through a 2019 lens.

2.1 The equal-weight spreadsheet

The most pervasive failure is the spreadsheet model that assigns equal weight to every risk factor. Country risk, product risk, customer type, delivery channel and transaction volume each receive one point on a one-to-three scale, and the scores are summed. There is no documented reasoning for treating a customer conducting cross-border wire transfers to a FATF grey-list jurisdiction as carrying the same factor weight as a customer using a standard domestic savings account.

Equal weighting is not inherently wrong. It is wrong when it is unexplained. If an institution has determined, through analysis of its specific risk exposure, that all factors contribute equally to ML/TF risk, and has documented that analysis, the model is defensible. In practice, equal weighting is the default because no analysis has been performed: the spreadsheet was built once, often by copying a template inherited from an earlier consulting engagement, and has not been revisited since.

2.2 No documented factor rationale

Even where models do use differentiated weights, the rationale for factor selection and weighting is rarely documented. Why does source of wealth carry a weight of three when industry sector carries a weight of two? Why is the PEP factor weighted at five and the high-risk country factor weighted at four? These are not arbitrary design choices; they reflect the institution’s assessment of which risk dimensions most materially affect its specific exposure. When the choices are undocumented, the model cannot survive supervisory scrutiny. An examiner will ask what analysis supports the weighting, and an answer that points to implementation defaults, or to a consultant’s historical recommendation, is not sufficient.

2.3 No model validation

A customer risk model is, in mathematical terms, a quantitative decision system, and like any quantitative system it requires validation against outcomes. Does the model produce ratings that align with the institution’s actual risk experience? Are High-rated customers generating disproportionately more SARs than Low-rated customers? Is the distribution of ratings consistent with the institution’s risk appetite and customer base? Most institutions have never performed this analysis, and cannot answer the most basic of validation questions, namely what proportion of SARs filed in the most recent year originated from customers rated Low at the time of filing. If the proportion is high, the model is systematically under-rating. If it is zero, the model may be capturing risk appropriately, or it may be that Low-rated customers are simply not being monitored, in which case the validation is circular and meaningless.

2.4 No override audit trail

Compliance officers routinely override model-generated ratings. A customer scored as Medium may be elevated to High on the basis of adverse media, or downgraded to Low on the basis of long relationship tenure. Overrides are a legitimate and necessary feature of risk-based CDD; they are not the failure. The failure is the absence of an audit trail. When an examiner reviews a customer file and finds that the model rating was High but the applied rating is Medium, the expectation is to find a record of who overrode the rating, when, with what justification, and on whose authority. In most institutions, the override exists only as an edited cell in a spreadsheet, with no record of the original score, no documented reason for the change, and no approval chain.

2.5 No recalibration cycle

A model built in 2019 was built for the institution’s 2019 customer base, product mix, geographic footprint and risk environment. By 2026, the institution may have launched new products, entered new markets, onboarded new customer segments and faced new typologies. If the model has not been recalibrated, it is assessing 2026 risk through a 2019 lens. BaFin expects a documented review cycle, the EBA Guidelines require regular testing and review, and FATF Recommendation 1 requires the business-wide risk assessment to be kept up to date. In practice, most institutions have no scheduled recalibration, no trigger-based review process, and no documented history of model changes.

3. The Enforcement Record

The pattern across recent enforcement actions in BaFin, the FCA and DNB is consistent enough to make the regulatory direction of travel unambiguous. The table below sets out five representative cases, the model failure that was cited in each, and the characteristic of an explainable model that would have prevented the finding.

Year	Regulator	Institution	Penalty	Model failure cited	What an explainable model would have produced
2024	BaFin	Commerzbank AG	€1.45m fine for AML and supervisory failings involving comdirect Bank AG	Risk classification did not drive proportionate CDD; data from 2015 refresh not on a risk-based cycle	Each rating tied to a defined CDD intensity; review cadence linked to rating; data-staleness alerts at threshold
2021	FCA (UK)	NatWest	£264.8m criminal fine	Risk rating lowered repeatedly without documented rationale; no automated review trigger on cash-deposit pattern	Override audit trail capturing original score, justification and approver; rule-driven escalation on cash thresholds
2022	FCA (UK)	Santander UK	£107.7m fine	Money Service Businesses classified as standard customers; customer-type factor under-weighted or absent	Customer-type factor calibrated to MSB risk; weighting traceable to business-wide risk assessment
2024	BaFin	N26	€9.2m fine (after €4.25m in 2021 and onboarding cap)	Late SAR submissions; risk model not calibrated to neobank growth and customer-mix change	Recalibration cycle triggered by material business-mix changes; rating-to-monitoring linkage tested annually
2025	DNB (NL)	bunq	€2.6m fine	Severe and culpable deficiencies in unusual transaction investigation and escalation	Documented model logic for AI-assisted scoring; defensible rationale on every escalation decision

A few observations are worth drawing out from the table. The NatWest case is, in regulatory terms, the most studied example because the model failure was specific and the consequence was a criminal fine. The customer’s risk rating was lowered repeatedly as deposits escalated to a total of around three hundred and sixty-five million pounds, of which roughly two hundred and sixty-four million pounds was cash, and the FCA finding was not principally about the volume of suspicious activity but about the integrity of the rating system itself, which permitted downgrades without documented rationale, without independent review and without triggering automated alerts.

The Santander UK case identifies a different failure: not the integrity of the model but the calibration of its factors. Money Service Businesses, a category recognised as higher risk under FATF guidance and across every national AML framework, were classified as standard commercial customers, which in turn meant standard CDD, standard monitoring and standard review cycles. The model produced ratings that did not reflect the institution’s actual risk exposure, and the institution could not explain why.

The Commerzbank case illustrates a third structural failure: the model classified customers, but the classification did not drive operational behaviour. High-risk customers were not subjected to more intensive review, customer data decayed without triggering reassessment, and the rating became a label rather than a control.

The Dutch DNB action against bunq adds a further dimension that is becoming increasingly relevant: the use of AI-assisted models. The Dutch courts confirmed that DNB had been wrong to prohibit bunq from using machine learning for its risk monitoring, but the subsequent enforcement made clear that the use of AI does not relieve the institution of the obligation to document the model logic and to defend each escalation decision on its merits.

4. What a Defensible Risk Model Looks Like

A customer risk model that can withstand a BaFin §44 examination, an FCA SYSC review or, from July 2027, a direct AMLR enforcement action has six characteristics in common.

Factor selection with documented rationale

Every risk factor is selected on the basis of the business-wide risk assessment. The rationale for each factor's inclusion is documented, and so is the rationale for the factors that were considered and excluded.

Differentiated weighting with justification

Factor weights reflect the institution's assessment of relative risk contribution. A remittance provider weights geographic risk more heavily than a domestic retail bank, and the weighting is justified by reference to the BWRA, regulatory guidance, observed typology experience and SAR data.

Documented thresholds

The boundaries between Low, Medium and High are explicitly defined. The institution can explain why a composite score of 45 is Medium and 46 is High, and what analysis supports the boundary at that point.

Distribution analysis

A healthy model produces a distribution that reflects the institution's actual risk exposure. Anomalies, like a sudden clustering after a model change, or a corridor with 100% Low ratings despite known risk indicators, trigger investigation and recalibration.

Override governance

Every override is recorded with the original model score, the overridden score, the reason, the person who performed it, the person who approved it, and the date. Override patterns are analysed periodically to identify systematic model weaknesses.

Examiner-ready output

The model can produce, on demand, a complete explanation of any individual customer's risk rating. Not a developer artefact for internal debugging. A compliance document, formatted for review, producible without engineering involvement.

4.1 Factor selection with documented rationale

Every risk factor included in the model, whether customer type, geographic exposure, product or service risk, delivery channel, transaction behaviour, or source of wealth and funds, is selected on the basis of the institution’s business-wide risk assessment. The rationale for each factor’s inclusion is documented, and so is the rationale for the factors that were considered and excluded. The methodology document is the artefact that ties the business-wide risk assessment to the individual-customer risk model; without it, the link between the two is implicit at best and absent at worst.

4.2 Differentiated weighting with justification

Factor weights reflect the institution’s assessment of relative risk contribution. A remittance provider with concentrated exposure to high-risk corridors will weight geographic risk more heavily than a domestic retail bank, and the weighting will be justified by reference to the business-wide risk assessment, applicable regulatory guidance, observed typology experience and SAR data. The justification is recorded against each weight, and changes to the weighting are governed by the same process that produced the original weights.

4.3 Documented thresholds

The boundaries between Low, Medium and High, or any other classification tiers the institution operates, are explicitly defined. The institution can explain why a composite score of 45 is Medium and 46 is High, and what analysis supports the boundary at that point. Distribution analysis confirms that the thresholds produce a rating distribution consistent with the institution’s actual risk profile rather than a degenerate distribution such as 95 per cent Low, 4 per cent Medium and 1 per cent High, which is a strong indicator that the model is failing to discriminate.

4.4 Distribution analysis

The institution analyses the distribution of risk ratings across its customer base periodically. A healthy model produces a distribution that reflects the institution’s actual risk exposure, and meaningful anomalies, such as a sudden clustering of customers at one rating level after a model change, or a geographic corridor with one hundred per cent Low ratings despite the presence of known risk indicators, trigger investigation and, where necessary, recalibration.

4.5 Override governance

Every override is recorded with the original model score, the overridden score, the reason for the override, the identity of the person who performed it, the identity of the person who approved it, and the date. Override patterns are analysed periodically to identify systematic model weaknesses. If a meaningful share of Medium-rated customers in a particular segment is consistently overridden to High, the model’s factor weights for that segment may need adjustment; the override data is the evidence base for that adjustment.

4.6 Examiner-ready output

The model can produce, on demand, a complete explanation of any individual customer’s risk rating: which factors contributed, what data populated each factor, what weight was applied, how the composite score was calculated, and where the score falls relative to the classification thresholds. This is not a developer artefact for internal debugging; it is a compliance document, formatted to be reviewed by an examiner, and the institution should be able to produce it for any customer in the portfolio without engineering involvement.

5. AMLR Article 20 and the July 2027 Deadline

The European Anti-Money Laundering Regulation, which takes effect on 10 July 2027, raises the bar in ways that materially change the consequences of failing to meet the expectations described above. Article 20 specifies the CDD measures that obliged entities must apply, with explicit requirements for documentation and risk-based calibration. Under the AMLR, the customer risk assessment is no longer merely a best-practice expectation derived from guidelines; it is a directly applicable regulatory requirement across all EU member states.

Article 20 requires that the extent of CDD measures be commensurate with the risks identified, and that institutions document the basis for their risk assessment. Read in combination with Article 10, which requires a comprehensive, documented risk-assessment methodology, and Article 26, which addresses the calibration of ongoing monitoring, the AMLR creates a framework in which an unexplainable risk model is not merely a finding to be remediated but a regulatory breach in its own right. Institutions that cannot demonstrate model explainability by July 2027 face direct enforcement under the AMLR in addition to national supervisory measures, and the future EU Anti-Money Laundering Authority will, by design, prioritise consistent enforcement of these provisions across member states.

6. WBP Justifier

WBP Justifier is a customer risk scoring engine designed for the explainability requirement that this paper describes. Every risk score it produces carries a complete audit trail: the factors that were considered, the weights that were applied, the data that populated each factor, the threshold that determined the classification, and the governance metadata recording who configured the model and on what authority. The output is structured for the examiner rather than the developer; for any customer in the portfolio, the platform produces a single document that answers each question a BaFin §44 auditor or an FCA SYSC reviewer would put.

The model is configurable by the compliance function without engineering involvement. Factor weights, scoring rules, classification thresholds and review triggers are administered through a compliance interface, and every configuration change is recorded in an immutable audit log. Distribution analysis, override tracking and recalibration alerts are part of the core scoring architecture rather than add-on reporting; they are present because the regulatory expectation requires them, not because they are useful to have. For institutions facing the AMLR Article 20 deadline, Justifier provides a path from an unexplainable spreadsheet model to an examiner-ready risk-scoring platform within weeks rather than quarters.

See how Justifier makes every risk score explainable to an examiner.

Book a demo