Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

CHAPTER 7
Extending the Governance Framework for Machine Learning Validation and Ongoing Monitoring

The benefits of AI and machine learning do not present themselves without risks. Internal and external stakeholders are rightly concerned about the typical challenges ranging from the ethical use of AI and machine learning to the ability to explain the workings of the algorithms, to the amplified risk of propagating bias in decision-making. However, one of the toughest challenges is creating a suitable and robust continuous performance monitoring framework that can adapt and respond to the increased model risk of AI and machine learning. Keeping up with the same number of resources and keeping costs low is a key concern for many risk managers. Having a monitoring framework in place spans beyond the regulatory and pre-production aspects of model risk management. In general, an AI and machine learning monitoring framework should be adept to handle:

Model degradation. AI and machine learning tend to degrade faster than traditional and historically tuned statistical models.
Biased predictions. AI and machine learning tend to identify complex and nonlinear patterns in data that are hidden from traditional models. These patterns may propagate biased decision-making or hide corrupted data. In a way, the machine learning needs to be protected from the bias in data that can contribute to unfair decisions that they are a part of. Other components of decision making are policy rules, business roles, and human-based manual overrides. All aspects that contribute to decisions, not limited to models, can result in biased predictions.
Balance. High materiality models have always required a high level of human oversight. The same applies to the use of AI and machine learning: closed, fully automated, black-box systems that lack human oversight and ability to override are not recommended. There are also other key considerations to account for, such as the context in which the model will be used, and privacy implications of the data.

Typically, risk models, including those used for intra-day or near-real-time calculations, are used in settings with existing and established model governance practices in place. With the use of more advanced modeling technologies and approaches, like AI and machine learning, model risk tends to be higher (as explained in the next section), therefore near-real-time performance tracking of these models becomes critical.

A robust AI and machine learning governance framework aligns the oversight of the use of AI systems and machine learning to the corporate strategy and values of an organization. AI governance is that strategy that then defines the policies and procedures to establish accountability for the safe development, deployment, and use of AI systems and machine learning. These aspects are central tenants of Conway's law—that an organization's dysfunction always reveals itself in its products. Thus, AI and machine learning governance reflects the organization's mindset.

In 2019, the OECD defined a set of AI principles to promote the use of AI for innovation while protecting human rights.¹ Since then, regulators around the world have followed suit. In Asia, for example, the Monetary Authority of Singapore defined a model governance framework for AI systems,² and in Europe, the European Union is clearly differentiating between high- and low-risk AI applications with prescribed requirements for each.³

In addition to regulations on AI governance, in 2021, the Federal Trade Commission in the United States defined its expectations for the responsible use of AI, especially in the context of fair outcomes for consumers.⁴

In 2021, the Office of the Currency Comptroller updated its model risk management standards to take into account the use of AI and machine learning.⁵ Recently, the UK government has taken one step further by publishing a set of national algorithmic transparency standards for the use of AI in the public sector.⁶

In a sense, all models are representations of reality, and by its nature, will have some degree of uncertainty and inaccuracy. The better the characteristics and the limitations of the model are understood—up front—the better informed the model validation function will be in managing and monitoring the model in production.

Specifically, in financial services, a principle-based governance framework will promote the proper use of models, the right layers of accountability and degrees of transparency, together with consistency in the development, deployment, and use of risk models.

ESTABLISHING THE RIGHT INTERNAL GOVERNANCE FRAMEWORK

Where models are used for decision-making, whether traditional or AI and machine learning, organizations should be attentive to the possible adverse consequences of decisions based on models that are incorrect or misused and should address those consequences through active model risk management. These adverse consequences may include, but are not limited potential for monetary loss, and reputational risk.

The concept of AI and machine learning governance is much debated by industry bodies. Based in part on highly publicized AI model failings (take, for example, the chatbot Tay) and misconceptions of how automated systems operate, there is a general sense of distrust in the responsible use of AI and machine learning that further highlights the need for better model governance.

Most organizations have set up strong governance frameworks for their traditional risk models. According to regulatory guidance, with the right controls in place, existing frameworks can be extended to AI and machine learning.⁷

According to the Office of the Currency Comptroller (OCC), in SR11/7, an effective model governance framework includes robust model development, implementation, and use, as well as effective validation, sound governance, policies, and controls.

The OCC highlights that model risk increases with greater model complexity, with more inputs and more uncertainty about model assumptions. With a potential broader extent of use, the potential impact, if the model goes wrong, is larger.

When it comes to effective model risk management and expectations from supervisors, organizations must prove their understanding of model inaccuracies and unintended consequences, their implications, and mitigating actions.

In general, it is helpful to align the level of governance based on the context and the materiality of the risk model. Keep in mind that the model governance approach aligns to the size of the organization and materiality of the model, as well as the impacts, if unintended consequences are realized. If additional users are using a model, then it is recommended that the model materiality be reassessed. In practice, we have found that if an organization focuses on the performance of models—be that traditional models or machine learning—at group level only, then subsidiaries can be ignored, where the materiality of the models could be vastly different. This might not create a significant exposure at the group level but could pose significant model and reputational risk at the subsidiary level.

To provide the necessary governance and oversight, for risk models that impact high-stakes risk decisions, a range of stakeholders, from external regulators to the board, to senior management, are involved in various stages of the risk model lifecycle. In addition to stakeholders directly involved in the risk model lifecycle, the right governance framework will further foster trust and transparency in the models for other users and consumers.

DEVELOPING MACHINE LEARNING MODELS WITH GOVERNANCE IN MIND

An important question to ask yourself before launching an AI or machine learning is: “How do we put the right governance guardrails in place to ensure that models from the lab put into production continue to remain fit for purpose and deliver value expected?” One approach, as mentioned, is to build stronger trust in the data, the models, and align expected outcomes with the business objectives.

Having the right people, processes, and policies in place will help standardize the treatment of AI and machine learning at the enterprise level. As many of the model development steps can be automated, AI and machine learning are often easier to develop compared to, say, traditional statistical models, but it requires higher levels of monitoring (e.g., testing of model performance once a year is not enough, as AI and machine learning needs to be monitored much more frequently, and, depending on the use case, even near-real-time).

With AI and machine learning governance, questions about model use can be raised throughout the model lifecycle:

How were the conceptual soundness of AI and machine learning assessed?
How were the features generated and selected?
Can the model and its key drivers be explained in technical and nontechnical terms?
Is the trade-off in complexity and maintenance of a more sophisticated model outweighing the use of a more interpretable benchmark model?
How do I plan to ensure that the machine learning continues to be fit for purpose?
How fast can the model be replaced if it shows a significant performance deterioration?

These questions and others are captured in Figure 7.1.

For these reasons, existing governance frameworks may not be sufficient to handle the increased model risk associated with AI and machine learning.

As mentioned, in general, with the use of AI and machine learning, model risk tends to be higher, as the algorithms are more sophisticated in approach and use. In addition, post-hoc explainability and interpretability add an additional layer of complexity. Keep in mind that another layer of complexity, in some AI and machine learning, stems from the use of optimized hyperparameters (explained fully in Chapter 8). There is also an increased risk of model error due to poor data quality and the risk of perpetuating societal or individual biases using biased data or biased algorithms.

Figure 7.2 captures the mentioned challenges, in addition to others, that tend to increase model risk with AI and machine learning models.

Standardized industry practices to better manage the increasing demands pertaining to model governance of AI and machine learning are still evolving with influential forces across internal and external stakeholders:

Public pressure and advocacy to promote the ethical use of AI and machine learning.
Compliance with laws and regulations that compel organizations to respect the rights and interests of people with respect to how they use AI and machine learning, together with personal data.

Figure 7.1 Model governance questions. There are questions that the business can raise throughout the lifecycle of an AI/ML model, including during the design, development, predeployment, and monitoring phases.

Figure 7.2 Model risk increases with AI/ML models. It is important to understand that the risks associated with a model tend to increase with AI/ML models, namely across the model logic, explainability, model misuse, and bias and data quality.
Internal promotion of best practices in the development, deployment, and use of AI systems and machine learning.
Promotion of broader explainability and transparency of AI and machine learning: purpose, design, use of the model, model architecture, key drivers, data inputs, model outputs, model limitations, and assumptions and reproducible results.
Internal governance structures to enforce these best practices.
Use of interpretable benchmarks.
Independent audit of AI, machine learning, and autonomous systems.

MONITORING AI AND MACHINE LEARNING

Machine learning need governance just like other models, only more so. In part, this is because these models are designed to improve automatically through experience. Their ability to “learn” enables greater accuracy and predictability, but it can also greatly increase their model risk and result in biases based on patterns in data that they are able to identify. It is critical that organizations establish rigorous governance processes that can quickly identify when a model begins to fail, complete with defined operating controls on inputs (data) and output (model results). The dynamic nature of AI and machine learning means that it requires more frequent performance monitoring, constant data review, and benchmarking. With the increased volume of models, a better understanding of the contextual model inventory and actionable contingency plans is needed.⁸

Even when organizations have extended their model governance to AI and machine learning, there are further considerations about the ongoing monitoring of AI and machine learning that should be considered beyond those currently enacted and discussed in previous sections.

We have explained in this chapter that, traditionally, organizations have in place standard model governance frameworks that include regular monitoring reporting and reviews for regulatory and nonregulatory risk models. All risk models are typically subject to internal and external validation, which requires regular tracking of models by performance and operational computed metrics.

As new regulatory requirements necessitate the need to adjust the models and the number of risk models over time, the sophistication of modeling methods is increasing and will continue to do so with the mainstream adoption of AI and machine learning. AI and machine learning are often dynamically updated (i.e., calibrated with new data with more ease), which means that a slew of metrics will need tracking over time to ensure that the model is performing within the boundaries as expected by business goals and compliance measures.

These are best explained when taking a holistic view of the “health” of an algorithm across the dimensions listed below and shown in Figure 7.3:

Input data and data quality.
Features and variable importance.
Model assumptions and limitations—made during model development, including the data limitations used to engineer features.
Benchmarks applied and how these were derived.
Stability at the level of the population and characteristics (or engineered features).
Performance of the model over time, comparing performance to benchmark.
Decisions made such as purely model-derived decisions or in combination with rule sets or other post-modeling processes.

Complementary to holistic model health, there are ways in which organizations can validate that the model is performing in line with expectations. These metrics can be automated and run at regular time intervals: annually, quarterly, monthly, weekly, daily, intraday, or near-real-time. A selection of metrics are included and described in more detail below: model decay, stability, robustness, interpretability, bias, and model adjustments.

Schematic illustration of further considerations of AI and machine learning that should be taken when taking a more holistic view of model health, including measures that can identify models that require immediate attention before drop in their performance can be measured. — **Figure 7.3** Further considerations of AI and machine learning that should be taken when taking a more holistic view of model health, including measures that can identify models that require immediate attention before drop in their performance can be measured.

Model Decay

The established industry practices for model validation typically include the monitoring of the performance of the model, the stability of model inputs, and model calibration. These metrics are all still valid for AI and machine learning. In addition, for AI and machine learning, especially models that continuously update, it is necessary to continuously monitor the models' explainability and potential for bias. Importantly, the performance monitoring results should be summarized in a way that provides understanding to users of where the root causes of issues reside, so the performance results can be reviewed in context.

For example, as mentioned, given their complexity, AI and machine learning tend to pick up relationships in the data that may be temporary. To account for temporal fluctuations, the models must be monitored more frequently.

Model degradation is typically a consequence of data or concept drift:

Data drift occurs when the distributions of feature attributes change over time when compared to the model development sample. This requires monitoring of the sample data vs. production data—the complexity increases when a model is updated more frequently.
Concept drift occurs when a change in market conditions or policy changes the model interpretation. In the case of a model interpretation change, typically, the development data needs to be relabeled and model retrained. Concept drift can be measured by evaluating the historical relationships between features that are driving the model's predictions.

Stability

The stability of an AI or machine learning and benchmark model is typically performed by assessing the stability of the data inputs, the population, and the features.

Population Drift

The Population Stability Index (PSI) is one way to measure that the model outcomes are stable over time. It is an established metric for traditional statistical or econometric models. It's a divergence metric that measures the distance between two distributions (e.g., the current population and the model development sample). It is often interpreted as a rule of thumb but can also be interpreted using confidence intervals, which may be more appropriate for continuous monitoring for machine learning models.⁹

Feature Drift

The population stability index can also be used to calculate divergence at the characteristic level, namely the characteristic stability index (CSI). This measures how the distributions of the model characteristic differ from the development sample.

Another approach to assess the feature stability of AI or machine learning is by bootstrapping or random perturbation. This can be done either by sub-sampling or by randomly perturbing the development data so as to create n datasets. The algorithm for selecting features is applied to n datasets, and m features are created. The stability of the features is measured by determining the similarity of the features selected across the n datasets.

It can also be calculated using the t-distributed stochastic neighbor embedding (T-SNE).

Robustness, Benchmarking, and Backtesting

The robustness of a model, whether machine learning–based or not, refers to the model's ability to generalize well on new data. Model robustness can be improved by generalization and stopping and pruning methods. Overfitting can also be evaluated by cross-validation. These tests include benchmarking, backtesting, and stability assessments.

Traditionally, benchmarking is where the results of the model are compared to a benchmark. By example, the benchmark sample could stem from the initial periods' implementation data. Backtesting is performed when the model logic is applied to historical data to evaluate how stable the predicted values are against the actual values.

Traditional models can be used as benchmarks for AI and machine learning. Benchmark models are typically developed under the same major assumptions of the AI and machine learning. The robustness and performance of both the traditional and AI or machine learning model can then be measured and compared for statistically significant differences.

Interpretability

Traditional models tend to use defined variable selection and model fitting methods that are directly explainable. For example, the parameter estimates of a generalized-linear model (GLM) can be interpreted directly, reflecting the contributions of the features, and thus the model is interpretable from its parameter estimates. The coefficients of a machine learning model like a gradient-boosting machine (GBM) cannot be interpreted directly, and so they need alternative measures of interpretability.

Variable Importance

The lack of interpretation is suboptimal for AI and machine learning, as it needs a consistent method to assess variable importance across a range of methods. A generalized way of assessing the variable importance of AI and machine learning is to fit a decision tree to approximate model predictions to identify and prioritize variables in order of importance. Another way is to create a parameter that extends the classical R2 method of linear regression that provides a description of the relationship between model predictions and input features.¹⁰ Figure 7.4 supplies an illustrative general sample of variable importance.

Partial Dependence

Partial dependence (PD) plots are a model-agnostic explainability measure that can be used to interpret an AI and machine learning at the global level. Model-agnostic measures like PD plots are applied to supervised AI and machine learning.

An illustration of Variable Importance: An Example. — **Figure 7.4** Variable importance: An example. Here, the train importance value corresponds to the contribution the variable makes to the success of the model—the higher the value, then the more plausible it is that the variable represents the true cause of prediction.

The PD plot (Figure 7.5) measures the contribution of each feature by measuring how the model prediction changes with changes in the feature values. This is done by perturbing the input data and averaging the effect of a feature for the entire data. A more detailed description of PD plots is provided in Chapter 4. Using a descriptive example, a PD plot shows the increase in prediction estimate with an increase in current payment status. The PD plot is displaying the relationship between the input feature (independent variable) and target variable. PD plots are useful to display the “marginal effect” of a variable on the prediction outcome.

Individual Conditional Expectation

Individual conditional expectation (ICE) plots (Figure 7.6) separate the PD function for each observation or at the subgroup level. They are helpful to identify trends, differences, and interactions.

Snapshot of partial dependence (PD) plot: An example. The independent variable x1 vs. the model outcome bar, after considering the average effect of other independent variables in the model. — **Figure 7.5** Partial dependence (PD) plot: An example. The independent variable x1 vs. the model outcome bar, after considering the average effect of other independent variables in the model.

Snapshot of the individual conditional expectation (ICE) plot. The ICE plot displays the relationship to the outcome model prediction. — **Figure 7.6** Diagrammatical representation of the individual conditional expectation (ICE) plot. The ICE plot displays the relationship to the outcome model prediction.

Shapley Values

Shapley values are another model-agnostic measure of model explainability. The Shapley value calculates the variable contributions by averaging the marginal contribution across all coalitions. This enables Shapley values to control for variable interactions. For example, in Figure 7.7, variables rank in ascending order. The horizontal location displays whether the effect of that value is associated with a higher or lower prediction. In this example, the observation demarcated by local instance 8001441 has a predicted value of 0 in the model training data. The borrowers' current payment plan status and limit balance have contributed to the prediction.

Anomaly Detection

Ideally, anomaly detection systems for AI and machine learning should have ability to collect useful features from high-dimensional data and find deviations from normal behavior. Autoencoders can design anomaly detection systems for AI and machine learning, as they reduce the number of units in the hidden layers of a neural network, for example, to reduce the dimensionality of data. The autoencoder functions as a feature extraction method. Here, the autoencoder learns the “normal” data points—anomalies are absent from the training data. When the autoencoder reconstructs, anomalous data points will fail.¹¹ The failure becomes known as a reconstruction error and can calculate anomaly scores and label “unseen” data points after deployment of the AI and machine learning.

Snapshot of Shapley value calculation using the kernel explainer: An example. — **Figure 7.7** Shapley value calculation using the kernel explainer: An example.

Bias

AI and machine learning are susceptible to either performance or results bias. We discuss the concept of performance bias in Chapter 5. Specifically, results bias refers to the model output that is statistically different for one “group” compared to another, or to benchmarks determined from the model development data or another benchmark sample. For the purposes of monitoring bias in an AI and machine learning, results bias is measured using a suitable computed metric (e.g., demographic parity), including tolerance thresholds. Bias may be introduced indirectly for various reasons, including internal policy changes, external regulatory compliance, or that there is not a sufficiently long history of observational data available for one “group” compared to another. When differences are observed, it can trigger a review to identify the cause for bias.

Model Adjustments

AI and machine learning typically utilize advanced methods to optimize the objective function of the algorithms. AI and machine learning are often regularly retrained using new training data. Risk models need retraining for several reasons. For example:

External conditions. Factors such as updated interest rates, updated systems, and associated data can initiate the need to retrain models.
Strategy changes. If it becomes clear during the development period that the model is not capturing business changes or changes in business strategy, the models could require retraining.
Business performance. Changes in the use and performance of a model, for example, due to the COVID-19 pandemic, can trigger a model retraining effort.

COMPLIANCE CONSIDERATIONS

Regulatory bodies around the world are starting to propose regulations and compliance guidelines on all aspects of AI and machine learning, including data privacy usage, ethics guidelines, and specific model risk management frameworks. However, the depth and breadth of these regulations and compliance guidelines depends on the rate of adoption of AI and machine learning in jurisdictions. Selected compliance considerations to be aware of include:

Global Data Protection Regulation (GDPR).
Equal Credit Opportunity Act (ECOA)—updated CFPB for small business loans.
SR Letter 11-7—updated guidance for AI and machine learning.
European Commission—laying down harmonized rules on AI.
EU guidelines for trustworthy AI.

GDPR (Global Data Protection Regulation)

This European law regulates data protection and privacy laws in the European Union and European Economic Area. Released by GDPR in 2019 was the “Ethics Guidelines for Trustworthy AI.” For AI to comply with the GDPR provisions, additional factors need to be carefully considered (see Table 7.1 for a summary of key questions).

ECOA (Equal Credit Opportunity Act)

The United States Federal Trade Commission (FTC) administers a wide variety of consumer protection laws, including those that prevent unfair methods of competition in commerce and prohibit unfair and deceptive acts or practices. The FTC also has a long history of using its authority to regulate private sector uses of personal information and algorithms that directly affect consumers. One of the ways that the FTC exercises such authority is by the Equal Credit Opportunity Act (ECOA). The ECOA helps to protect consumers in gaining equal access to credit by protecting them from discrimination based on protected variables such as race, color, sex, religion, age, and marital status.

Table 7.1 Key Questions When Complying with GDPR Guidelines and AI¹²

Guideline	Explanation brief
Right to not be subject to automated decision-making	The right to not be subject to automated decision-making is prohibited only if the decision-making is based solely on automated processing and produces legal effects concerning the data subject or similarly significantly affects them. However, it is allowed if the process is done with the data subject's explicit consent, or the controller has put sufficient safeguards in place. In this scenario, the safeguards that need to be provided include: involvement of human intervention to analyze and address the system's purpose, constraints, requirements, and decisions in a clear and transparent manner, explaining the automated decision-making clearly in the privacy policy to notify clients before processing their data, and obtaining explicit consent to notify a data subject that a decision is the result of an algorithm decision and they are interacting with an AI agent like chatbot or robot or other conversational system.
Transparent processing	Data subjects should be informed of the existence and purpose of the processing, especially when the data-processing activities involve automated decision-making. Meaningful information about the logic, significance, and envisaged consequences of such processing should be explicitly transparent to users.
Right to erasure	As a result of data sharing and data openness needed to collect and collate large amounts of data for AI to be of use, it is hard for data controllers to ensure that a third-party data server implements the deleting operation or if the data required to be erased are deleted completely from other joint controllers or data processors.
Data minimization	De-identification (pseudonymization) allows more data to be used, processed, and analyzed in AI. Pseudonymization meets the GDPR principle of data minimization, unlike anonymization, which means that the data subject is no longer or not identifiable. The GDPR promotes pseudonymization as an appropriate safeguard for organizations to repurpose data without additional consent.

In 2020 and 2021, the FTC notes that if a company uses an algorithm that either directly or indirectly through disparate impact discriminates against a protected class with respect to credit decisions, then the FTC can challenge the practice under the ECOA. In addition, the updated guidelines in 2020 and 2021 supply insights into expectations for organizations using AI. Examples include:

Start from the beginning. Organizations need to find whether training datasets include disparate treatment of protected groups.
Testing. Test algorithms before use and on a regular basis to make sure that they do not discriminate on the basis of above-mentioned protected variables.
The fairness of algorithms can be assessed by considering these questions:
- How representative is the dataset?
- Does the data model account for biases?
- How exact are the predictions based on big data?
- Does the reliance on big data raise ethical or fairness concerns?
- Does the algorithm do more harm than good? Organizations should ask themselves if their AI and machine learning cause more harm than good. If they do, then the FTC can consider them “unfair” under section 5 of the FTC Acts and subject to enforcement measures.

SR-Letter 11-7

The Board of Governors of the Federal Reserve System and Office of the Comptroller of the Currency published the SR Letter 11-7 in 2011 titled: “Supervisory Guidance on Model Risk Management.” The letter has become the gold standard for model risk management and one of the most important statements of a supervisor's expectations on management of model risk. Supervised banks and financial institutions in the United States are obligated to have implemented SR 11-7 principles into their model risk management frameworks.

Outside of the United States, a number of other supervisors have released regulatory expectations for model risk management, such as the European Central Bank (ECB), European Banking Authority (EBA), and national supervisors like the Canadian Office of the Superintendent of Financial Institutions (OSFI) and the Australian Prudential Authority (APRA, under Prudential Standard CPS 220 Risk Management and the Credit Risk Management Prudential Standard APS 220). In particular, the ECB incorporated SR 11-7 principles into its Targeted Review of Internal Models (TRIM) in 2015.

SR 11-7 is far reaching and outlines all the regulatory expectations on risk management, including defining what a model is, and the processes and internal controls that banks must implement for all “models,” irrespective of type. This helps to ensure that banks appropriately develop, implement, and use models in a controlled manner. The principles of SR 11-7 are critical, because all models, including AI and machine learning, risk sustaining losses due to inaccuracy or poor controls. The losses here can be far ranging, from inaccurate risk calculations, to costs involved with developing and implementing models, to the adverse reputational impacts of making unfair decisions.

Without a defined model risk management policy in place, the flexibility introduced by a range of model development applications can expose organizations to significant misuse and loss. In such a setting, there can also be an absence of appropriate controls and approvals, subsequent errors. Effective model risk management helps supply robustness and reduce the mentioned model risks.

EU Guidelines for Trustworthy AI

In 2021, the European Commission has legislated the use of AI systems in Europe.¹³ The regulation follows a tiered, risk-based approach where AI systems are classified as high risk, in line with the intended purpose of the AI system, such as biometric identification, law enforcement, and assessing creditworthiness of natural persons. The legal requirements for high-risk AI systems pertain to data and data governance, transparency and auditability, human oversight, robustness, accuracy, and security, derived from the Ethics Guidelines of the High-Level Expert Group on Artificial Intelligence.¹⁴

FURTHER TAKEAWAY

Based on the risks, the wider use of AI and machine learning means that current model risk governance programs in place need to extend. But, at the same time, it is important to understand the limitations of current model risk management programs that prevent a holistic view to model health across data, model inputs, benchmarks, stability, performance, and decisions.

It is important to note that any high-materiality model, irrespective of whether an AI and machine learning is used, either directly or indirectly, always requires a high level of human oversight. Regulatory bodies around the world are either extending their current regulation and compliance for model risk management to accommodate AI and machine learning or have created voluntary guidelines for a range of aspects of AI and machine learning, including data privacy usage and ethics.

Today, there is no global consensus on the design and deployment of ethical AI frameworks. However, within developed regulations and guidelines, the same ethical principles relating to fairness, ethics, accountability, and transparency are used. The depth and breadth of ethical AI compliance guidelines and regulations largely depend on the maturity of AI and machine learning adoption in the respective jurisdictions. For example, regulators in Asia Pacific are continuing to address ethical AI frameworks to help ensure that fairness objectives are achieved (e.g., the Australian government recently released the “AI ethics framework,” and the Hong Kong Monetary Authority has released regulatory guidance in the form of the “RegTech Adoption Practice Guide” that includes prerequisites for an “AI governance framework”)

CONCLUDING REMARKS

With the use of AI and machine learning, risk departments should broaden their thinking to extend their performance monitoring framework so that transparency of AI and machine learning can be achieved without overly depending on manual code. By dynamically tracking a range of interconnected categories—including, but not limited to, the input data, model decay, and model robustness, risk departments can better understand potential AI and machine learning limitations and proactively identify issues that may lead to model failure. Regulatory bodies and central banks also expect risk departments to adhere to certain compliance measures when validating AI and machine learning. We have explained the main concepts in this chapter.

However, even in the absence of clearly defined regulatory expectations of model risk management as outlined in SR 11-7, model risk management should still be top of mind, as the model risks associated with AI and machine learning can cause quantifiable loss, from inaccurate risk calculations, to the costs involved with developing and implementing models, to adverse impacts of making unfair decisions. These risks are worsened when AI and machine learning function outside of a robust model risk management framework that governs the development, deployment, and use of the models.

ENDNOTES

1. Artificial intelligence, https://www.oecd.org/going-digital/ai/principles/
2. Personal Data Protection Commission, Model Artificial Intelligence Governance Framework, 2^nd ed. (Singapore: SG:D, Infocomm Media Development Authority, and Personal Data Protection Commission, 2020), https://www.pdpc.gov.sg/-/media/files/pdpc/pdf-files/resource-for-organisation/ai/sgmodelaigovframework2.pdf
3. European Commission, Proposal for a Regulation of the European Parliament and of the Council: Laying Down Harmonised Rules on Artificial Intelligence (Artificial Intelligence Act) and Amending Certain Union Legislative Acts, Eur-Lex, Brussels, (April 21, 2021), https://eur-lex.europa.eu/legal-content/EN/TXT/?qid=1623335154975&uri=CELEX%3A52021PC0206
4. Elisa Jillson, Aiming for truth, fairness, and equity in your company's use of AI, Federal Trade Commission (April 19, 2021), https://www.ftc.gov/news-events/blogs/business-blog/2021/04/aiming-truth-fairness-equity-your-companys-use-ai
5. Office of the Comptroller of the Currency, Safety and Soundness, Model Risk Management, Version 1.0 (August 2021), https://www.occ.treas.gov/publications-and-resources/publications/comptrollers-handbook/files/model-risk-management/pub-ch-model-risk.pdf
6. Government of the United Kingdom, Algorithmic Transparency Standard (2021), https://www.gov.uk/government/collections/algorithmic-transparency-standard#:~:text=The%20Algorithmic%20Transparency%20Standard%20is,making%20in%20the%20public%20sector
7. Patrice Alexander Ficklin, Tom Pahl, and Paul Watkins, Innovation Spotlight: Providing Adverse Action Notices When Using AI/ML Models (Washington, DC: Consumer Financial Protection Bureau, 2020).
8. David Asermely, Whitepaper: Machine Learning Model Governance (Cary, NC: SAS Institute Inc., 2021).
9. Bilal Yurdakul, Statistical Properties of Population Stability Index, PhD, dissertation (Western Michigan University, ScholarWorks, April 2018), https://scholarworks.wmich.edu/dissertations/3208/
10. Brian D. Williamson, Peter B. Gilbert, Marco Carone, and Noah Simon, “Nonparametric variable importance assessment using machine learning techniques,” Biometrics 77 (1) (March 2021): 9–22.
11. Sabtain Ahmad, Kevin Stype-Rekowski, Sasho Nedelkoski, and Odej Kao, Autoencoder-based Condition Monitoring and Anomaly Detection Method for Rotating Machines (Cornell University, Arvix, 2021).
12. Andrea Tang, “Making AI GDPR compliant,” ISACA Journal (5) (2019), https://www.isaca.org/resources/isaca-journal/issues/2019/volume-5/making-ai-gdpr-compliant
13. European Commission, Proposal for a Regulation of the European Parliament.
14. European Commission, Ethics guidelines for trustworthy AI (April 8, 2019), https://digital-strategy.ec.europa.eu/en/library/ethics-guidelines-trustworthy-ai

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for CHAPTER 7: Extending the Governance Framework for Machine Learning Validation and Ongoing Monitoring

Create new playlist

Sign In

Sign Up