Sense and Nonsense About Credit Models – Part II

A curious trend that has become more pronounced since the pandemic is that many lenders are wondering if their credit models (i.e., scoring and other origination tools) are at fault for increased delinquency and charge-off. The purpose of this series of articles is to address the most common concerns related to credit models and to separate sense from nonsense.

In the first article, we addressed whether custom models always outperform generic ones, what to make of shifting performance throughout the credit cycle and the efficacy of credit scoring in the subprime sector. In this article, we conclude by addressing the issue of regional scorecards and the rising trend of using Artificial Intelligence (AI) in model development.

Nonsense: Our portfolio shows large variations in credit performance by geography; therefore, we must have models customized by region.

Sense: Lenders in states like California, Texas and Florida typically see significant differences in comparable score performance between the northern and southern parts of their states, but this doesn’t mean their models are insufficient. While the idea of building models by region has been around for decades, managers often insist on this for the wrong reasons which can lead to more volatility in results. It is important to note that for this discussion, we are assuming a regional model is at a great enough aggregation level to be compliant from a regulatory perspective.

The credit score is merely an index that sorts applicants from most to least risky. The actual default rate is imposed on the index after the fact, based on careful analysis of historical performance over time. As we all know, performance fluctuates throughout the credit cycle, but it may also fluctuate regionally. Some of this may be due to varying economic conditions, but more likely it is related to the fact that a higher concentration of lower quality applicants are present in one area relative to another.

Most lenders pull some version or another of either the FICO or Vantage Scores. While lenders may see worse performance on applicants in similar score ranges between one region and another, this does not indicate that the model is broken. Rather, it suggests that the baseline level of defaults is higher in one of those regions. The score still puts them in order of worst to best, but a different performance assumption must be evaluated (assuming the difference is sizable and consistent over a long period of time). What you are not going to see is groups of 500s performing better than groups of 550s or 600s within a single region. Risk managers have historically handled this either by modifying their default rate assumption by region, or having higher minimum score cutoffs for those riskier geographies.

A consumer’s likelihood of default doesn’t change because they move across a county or state boundary. It changes based on things that are intrinsic to their own behavior. As such, it is best to limit model predictors to those things that are robust in spite of exogenous factors. The argument in favor of regional scorecards must come from statistically valid research identifying different predictors or substantially different weightings coming into the model than one would see in another location.

I am not philosophically opposed to regional models, but the threshold must be set high enough to avoid common pitfalls such as:

• Sample Size and Representativeness: Some geographic regions may have smaller populations or fewer credit-active individuals compared to others. This can result in smaller sample sizes for certain segments, leading to less reliable estimates and potentially biased model outcomes. Additionally, if the sample is not representative of the population within each region, the model’s predictive accuracy may be compromised.

• Spatial Correlation: Individuals within the same geographic region may exhibit correlation in their credit behavior due to factors such as local economic conditions or regional policies. Ignoring this can lead to model misspecification and inaccurate predictions. Many of those factors may be temporary and will dissipate going forward, making the model less robust.

• Data Availability and Consistency: Data can vary across geographic regions, especially in regions with less developed financial infrastructure or regulatory reporting requirements. Inconsistencies in data quality or coverage can affect the reliability and robustness of the model estimates.

• Model Complexity: Segmenting by geographic region adds complexity to the modeling process, as it requires creating and maintaining separate models for each region. This increases the computational burden and operational costs associated with model development, validation, and monitoring. Furthermore, as models become more complex the variation in performance increases.

While segmenting by geographic region can provide insights into regional credit dynamics, it is essential to carefully consider and address the drawbacks and challenges associated with this approach to develop accurate and robust credit scoring models. It is not enough just to observe that the default rates are varied. Before considering regional cards, the modeler should test the effect of dummy variables that identify varied geographies, which provides the benefit of adjusting scores by region without adding unnecessary complexity and volatility.

Nonsense: AI models are far superior to conventional credit scoring techniques and routinely outperform them in retroactive tests.

Sense: There are generally two camps lenders fall into when you bring up the topic of AI. The first group believes it is a magic elixir that will outdo anything created with human supervision, or at least they hope investors and stakeholders believe this. The second group writes it off as smoke and mirrors. The truth of the matter is that AI encompasses a collection of tools that have fantastic applications to many complex problems, but also possess limitations that make them a poor fit for other ones.

In the July/August 2018 edition of the Non-Prime Times I published an article titled “Will the Real AI Please Stand Up?” In that article, I go into great detail regarding proper and improper applications of artificial intelligence tools. If you would like to receive this article, please reach out to me through the e-mail address at the end of the article. For the purpose of this discussion, I will use machine learning (ML) as an illustration as it is the method most commonly used among AI tools in this industry.

Tools, such as ML, frequently utilize what is called discrete mathematics. Discrete mathematics, while a powerful tool in many areas of computer science and mathematics, does have some drawbacks when applied to creating predictive models. Some of the more serious of these drawbacks include:

• Sensitivity to Input Data: Discrete models can be sensitive to changes in input data, especially if the data is noisy or contains outliers. Small changes in input values can lead to significant changes in the model’s predictions, which can be undesirable in many real-world applications. Credit factors and subsequent performance are profoundly noisy, which does not bode well for overly complex modeling forms.

• Difficulty in Handling Uncertainty: Predictive modeling often involves dealing with uncertainty in the data and the underlying processes being modeled. Discrete mathematics may not provide adequate tools for quantifying and managing uncertainty, especially compared to probabilistic or statistical approaches.

• Generalization to New Data: Discrete models may struggle with generalizing patterns learned from training data to unseen data, especially if the underlying relationships are complex or not well-represented in the training data. This can result in overfitting or underfitting of the model. This is often the result of what we refer to as inductive bias, which in this case means the data you have to build the model on is not representative of the future period you are applying the model to. This is certainly a relevant concern today in light of the volatility observed from the pandemic through to a high inflationary period.

It is important to think of analytics as a toolbox full of instruments that have good uses for some problems, but not for others. All modeling techniques have strengths and limitations that must be evaluated based on the problem the modeler is trying to solve. The choice of method should not be based on the sophistication or novelty of a particular approach, but on which one validates the most successfully while minimizing the limitations associated with it.

Best Practices

Disciplined lenders tend to read the signs and tighten up credit when the market overheats. Others keep expanding through highly competitive periods, often by loosening up on price and credit standards. The latter group gets hit hardest during economic downturns as consumers with the weakest credit and highest debt loads default. This supports my long-held belief that the majority of the credit equation is based on what the lender does at the point of origination.

Having said that, what we saw during and after the pandemic was unprecedented. Performance improved in 2020 through early 2021 due to a massive amount of forbearance in the entire consumer lending market, enhanced by stimulus. Likewise, record inflation and vehicle prices pushed consumers to default who would not have in other periods. None of those outcomes were caused by a breakdown in credit policy or models.

These performance shocks have caused many lenders to re-evaluate their credit and portfolio management strategies, particularly as it relates to scoring models. As managers seek answers to what might have been missed over the past two years, it is important to not throw out the baby with the bathwater. The key to long-term growth, recovery and stability for lenders is to avoid over-reacting to outside forces and remained focused on best practices that have been reliable for many years.