Course Schedule And Contents

Introduction, Requirements, and Use Cases

Week 1: Tuesday 24th March
Introduction to AI model risk

AI model risk as a new discipline
- Rapid adoption of AI in financial services
- Explainability and transparency requirements
- Principles of quantitative measurement and reporting of AI model risk based on rigorous statistical tests
Changes compared to traditional risk management
- Conventional model risk management (MRM) vs. AI model management
  - Periodic validation vs. continuous assurance
  - Traditional backtesting vs. new validation techniques for LLMs
  - The evolving requirements for formal reporting
- Conventional operational risk (OpRisk) vs. AI model risk
  - Challenges of adopting OpRisk metrics for AI model risk
  - Ethical and social implications of using AI in HR and other regulated contexts
- Conventional model stress testing vs. red-team testing
  - Non-adversarial vs. adversarial stress testing approaches for AI model risk
  - Prompt injection and jailbreak testing
  - Robustness to input shifts and edge cases
Reporting requirements for AI model risk
- Regulatory
  - Transparency and explainability documentation
  - Bias and fairness assessment reporting
  - Model inventory and governance oversight
- Internal
  - The importance of quantitative risk metrics based on rigorous statistical tests
  - Continuous monitoring dashboards
  - Drift and performance degradation tracking
  - Use-case specific quantitative risk metrics and KPIs

AI workflow types

Assistant workflows (multi-turn chat)
- Context management across multiple chats and conversation turns
- Maintaining consistency and coherence
- Handling clarification requests and corrections
Generation workflows (text output)
- Structured vs. free-form text generation
- Template-based and conditional generation
- Quality and style consistency
Comprehension workflows (text input, data output)
- Information extraction and structuring
- Classification and categorisation
- Data validation and quality checks

Selected use cases

Rating using numerical and category-based scales
- Numeric scales (e.g., 1-5, 1-10, 0-100)
- Category-based scales (e.g., poor/fair/good/excellent)
- Likert scales (degree of agreement)
- Binary classifications (pass/fail, yes/no)
Ranking using pointwise, pairwise, setwise, and listwise approach
- Reliability vs cost trade-offs
- Computational cost and latency considerations
- Bias mitigation requirements for each approach
Complex document analysis using rulebooks
- Security prospectuses
  - Extracting terms and conditions
  - Identifying risk factors and disclosures
  - Assessing the effect of legal caveats
- Regulatory requirements and guidelines
  - Compliance checking against rulebooks
  - Interpretation of ambiguous requirements
- Contracts
  - Key clause identification
  - Obligation and liability extraction
- RFP and RFI questionnaires and responses
  - Requirement matching and scoring
  - Gap analysis and compliance validation
Data entry from free-form text
- Trade confirmations
  - Structured field extraction (dates, amounts, counterparties)
  - Validation against expected formats and ranges
- Free-form emails and chats with trades and market quotes
  - Field recognition by context and position
  - Handling ambiguous semantic structure
  - Handling incomplete information
- Detecting template-generated inputs to improve reliability
  - Pattern recognition for template-generated formats
  - Leveraging template recognition for pre-approval and accuracy improvements

Practical Exercise: Building and testing AI-based workflows

The participants will build and test several AI-based multistep workflows

Note: No coding required. The exercise will be performed using an online playground.

Week 2: Tuesday 31st March
Quantitative Management of AI Model Risk

Measuring AI model risk

Statistical analysis of multiple runs
- Sample size and statistical power vs. cost
- Specialized distribution metrics for AI – not just mean and variance
- Confidence intervals and risk reporting
Techniques and challenges of run randomisation
- Temperature and sampling parameter control
- Seed randomisation and reproducibility
- Preamble randomisation to avoid memorisation and as a seed alternative
Systematic vs random errors
- Aleatoric uncertainty (inherent randomness in LLMs)
- Epistemic uncertainty (model capability and knowledge limitations)
- Aleatoric-epistemic decomposition in AI error analysis
Dealing with rare errors and thinking tangents
- Detection methods and reporting for low-frequency errors
- Fast vs. thinking model differences in error patterns
- Monitoring for unexpected reasoning paths
Judge models
- Independent and comparative scoring
- Chain-of-thought prompting for judge models
- Judge model bias detection and mitigation

Quantitative metrics by workflow type

- Measuring rating stability
  - Inter-run variance and consistency metrics
  - Scale calibration and score distribution analysis
  - Central tendency and range of scores
- Measuring ranking stability
  - Rank correlation metrics (Kendall’s tau, Spearman’s rho)
  - Position bias detection and mitigation
  - Agreement rates across multiple runs
- Measuring reliability of decision graph navigation for complex document analysis
  - Decision path consistency across runs
  - Node- and graph-level accuracy metrics
  - Error propagation through decision graph nodes
- Measuring reliability of data entry for multiple-choice, numerical and other field types
  - Accuracy reporting by field type (multiple-choice, numerical, date)
  - Detection and mitigation of hallucinated optional fields
  - Detection and mitigation of deviations from the output format

Practical Exercise: Measuring AI model risk

The participants will perform quantitative measurement of
AI model risk in the workflows they built.

Note: No coding required. The exercise will be performed using an online playground.

Week 3: Tuesday 7th April
Mitigation of Psychological Effects and Cognitive Biases in AI Models

Psychological effects

Thinking fast and slow for AI
- System 1 (fast, intuitive) vs System 2 (slow, deliberate) thinking in LLMs
- Failures in cognitive load optimisation
- Switching between System 1 and System 2 in fast models vs. advanced/thinking models
- Chain-of-thought to engage System 2 reasoning
Semantic illusions
- Failures in familiarity detection
- Misleading question structure
- Surface-level vs. deep comprehension testing
- Model susceptibility to deliberate semantic illusions
Framing effects
- Positive vs. negative framing (e.g., rate of success vs rate of failure)
- Influencing risk-averse vs. risk-seeking behaviour in AI
- Stress testing with logically equivalent reformulations
Priming effects
- Influence of unrelated context on responses
- Legal and compliance implications in HR and other regulated contexts
- Mitigation through context randomisation

Cognitive biases

- Confirmation bias, sycophancy, desire to please
  - Guessing and meeting user assumptions and expectations at the expense of accuracy
  - Seeking information that confirms priors
  - Advocating for the perceived user interests
- Informational anchoring
  - Misinterpretation or over-reliance on initially presented information
  - Numeric anchors affecting quantitative outputs
  - Testing with varied anchor values
- Priming-induced anchoring
  - When to expect priming-induced anchoring effects
  - Detection and mitigation strategies
  - Meeting legal and compliance requirements in HR and other regulated contexts
- Central tendency
  - Avoiding extreme scores on rating scales in favor of midrange values
  - Reduction in ranking stability due to the variable degree of central tendency
  - Few-shot and other methods to reduce and stabilize effects of central tendency
- Position bias
  - Favoring first-presented or last-presented items in multiple evaluation
  - Position swap testing methodology and metrics

Practical Exercise: Identifying and mitigating cognitive biases

The participants will identify and mitigate cognitive biases
affecting the workflows they built.

Note: No coding required. The exercise will be performed using an online playground.

Week 4: Tuesday 14th April
Improving Reliability of AI-Based Workflows

Key causes of uncertainty

Aleatoric vs epistemic uncertainty
- Inherent data randomness vs. model knowledge limitations
- Measurement approaches and mitigation strategies for each type
Hallucinations due to the lack of grounding
- Importance of grounding from external knowledge sources
- Assuming facts learned from common patterns in training data
- Detecting confident but incorrect responses
Psychological effects and cognitive biases
- Impact on reliability and consistency metrics
- Systematic vs. random bias-induced error patterns
Thinking tangents
- Unexpected reasoning paths in complex rules
- Prevention, detection and correction of undesirable thinking tangents

Mitigation by prompt and workflow design

Challenger models
- Using alternative models for validation
- Cross-model consistency checks
- Identifying model-specific biases and errors
Effective grounding
- Using retrieval-augmented generation (RAG) effectively
- Best practices for using and creating model context protocol (MCP) servers
- Using conventional (non-MCP) knowledge bases
- Citation and web search source tracking
Multistep workflows and decision graphs (rulebooks)
- Breaking complex tasks into manageable steps using rulebooks
- Conditional logic and branching paths
- Error detection and recovery mechanisms for complex rulebooks
Dynamic few-shot
- Selecting relevant examples at runtime
- Similarity ranking-based example retrieval (reverse lookup)
- Identifying and addressing gaps in curated few-shot examples
Corrective few-shot
- Learning from mistakes and failure cases
- Negative examples and counterexamples
- Iterative refinement and improvement

Mitigation by Monte Carlo

Random sampling across multiple randomized AI model runs
as a powerful way to improve AI workflow reliability
- Seed and prefix-based randomisation
- Achieving statistical confidence at a predefined threshold (e.g., 99% confident)
Voting across multiple runs for multiple-choice outputs
- Majority voting and consensus mechanisms
- Weighted voting based on confidence scores
- Handling approximate ties and low-confidence cases
Crowdsourcing across multiple runs for numerical and other continuous-scale outputs
- Effective aggregation techniques in the presence of outliers
- Distribution-agnostic confidence intervals under epistemic and aleatoric uncertainty

Practical Exercise: Using voting and crowdsourcing to improve reliability

The participants will use voting and crowdsourcing to improve reliability of the workflows they built.

Note: No coding required. The exercise will be performed using an online playground.

Introduction, Requirements, and Use Cases

Week 1: Tuesday 24th March
Introduction to AI model risk

AI workflow types

Selected use cases

Week 2: Tuesday 31st March
Quantitative Management of AI Model Risk

Measuring AI model risk

Quantitative metrics by workflow type

Week 3: Tuesday 7th April
Mitigation of Psychological Effects and Cognitive Biases in AI Models

Psychological effects

Cognitive biases

Week 4: Tuesday 14th April
Improving Reliability of AI-Based Workflows

Key causes of uncertainty

Mitigation by prompt and workflow design

Mitigation by Monte Carlo

Please enter your details to download this PDF

Event Email Reminder

Introduction, Requirements, and Use Cases

Week 1: Tuesday 24th March Introduction to AI model risk

AI workflow types

Selected use cases

Week 2: Tuesday 31st March Quantitative Management of AI Model Risk

Measuring AI model risk

Quantitative metrics by workflow type

Week 3: Tuesday 7th April Mitigation of Psychological Effects and Cognitive Biases in AI Models

Psychological effects

Cognitive biases

Week 4: Tuesday 14th April Improving Reliability of AI-Based Workflows

Key causes of uncertainty

Mitigation by prompt and workflow design

Mitigation by Monte Carlo

Please enter your details to download this PDF

Event Email Reminder

Week 1: Tuesday 24th March
Introduction to AI model risk

Week 2: Tuesday 31st March
Quantitative Management of AI Model Risk

Week 3: Tuesday 7th April
Mitigation of Psychological Effects and Cognitive Biases in AI Models

Week 4: Tuesday 14th April
Improving Reliability of AI-Based Workflows