Create Weighted Lifetime PD Model

This example shows how to use fitLifetimePDModel to create a PD model using weighted credit and macroeconomic data.

Load Data

Load the credit portfolio data.

load RetailCreditPanelData.mat

Join the two data components into a single data set.

data = join(data,dataMacro);
disp(head(data))

    ID    ScoreGroup    YOB    Default    Year     GDP     Market
    __    __________    ___    _______    ____    _____    ______

    1      Low Risk      1        0       1997     2.72      7.61
    1      Low Risk      2        0       1998     3.57     26.24
    1      Low Risk      3        0       1999     2.86      18.1
    1      Low Risk      4        0       2000     2.43      3.19
    1      Low Risk      5        0       2001     1.26    -10.51
    1      Low Risk      6        0       2002    -0.59    -22.95
    1      Low Risk      7        0       2003     0.63      2.78
    1      Low Risk      8        0       2004     1.85      9.48

Create Weights Variable

To create a weighted lifetime PD model, you need a weights variable. In this example, you create a weights variable by exponentially weighting recent data more heavily than older data. Give the most recent year (2004) a weight of 1, then shrink the weight for each preceding year by a factor of 0.96 relative to the year after. Display the data and weights.

% Get a list of years in the data set
Years = unique(data.Year);
n = size(Years,1);

% Initialize weights
YearWeights = zeros(n,1);
w = 1;

% The most recent year (2004) has a weight of 1, the weight for each preceding
% year is shrunk by a factor of .96 relative to the year after.
for i = n:-1:1
    YearWeights(i) = w;
    w = w*.96;
end

% Put the weights for each year in a table, so you can use join
YearWeights = table(Years, YearWeights,'VariableNames',{'Year','YearWeights'});
data = join(data,YearWeights,'Keys','Year');

% Show the weighted data
disp(head(data))

    ID    ScoreGroup    YOB    Default    Year     GDP     Market    YearWeights
    __    __________    ___    _______    ____    _____    ______    ___________

    1      Low Risk      1        0       1997     2.72      7.61      0.75145  
    1      Low Risk      2        0       1998     3.57     26.24      0.78276  
    1      Low Risk      3        0       1999     2.86      18.1      0.81537  
    1      Low Risk      4        0       2000     2.43      3.19      0.84935  
    1      Low Risk      5        0       2001     1.26    -10.51      0.88474  
    1      Low Risk      6        0       2002    -0.59    -22.95       0.9216  
    1      Low Risk      7        0       2003     0.63      2.78         0.96  
    1      Low Risk      8        0       2004     1.85      9.48            1

Partition Data

Partition the data into training and test sets.

nIDs = max(data.ID);
uniqueIDs = unique(data.ID);

rng('default'); % For reproducibility
c = cvpartition(nIDs,'HoldOut',0.4);

TrainIDInd = training(c);
TestIDInd = test(c);

TrainDataInd = ismember(data.ID,uniqueIDs(TrainIDInd));
TestDataInd = ismember(data.ID,uniqueIDs(TestIDInd));

Create a Lifetime PD Model

Select a ModelType for the lifetime PD model, then use fitLifetimePDModel to fit a weighted model using the WeightsVar name-value argument.

ModelType = "Probit"

ModelType = 
"Probit"

pdModel = fitLifetimePDModel(data(TrainDataInd,:),ModelType,...
    AgeVar="YOB", ...
    IDVar="ID", ...
    LoanVars="ScoreGroup", ...
    MacroVars={'GDP','Market'}, ...
    ResponseVar="Default",WeightsVar='YearWeights');
disp(pdModel)

  Probit with properties:

            ModelID: "Probit"
        Description: ""
    UnderlyingModel: [1×1 classreg.regr.CompactGeneralizedLinearModel]
              IDVar: "ID"
             AgeVar: "YOB"
           LoanVars: "ScoreGroup"
          MacroVars: ["GDP"    "Market"]
        ResponseVar: "Default"
         WeightsVar: "YearWeights"
       TimeInterval: 1

Display the underlying model.

disp(pdModel.UnderlyingModel)

Compact generalized linear regression model:
    probit(Default) ~ 1 + ScoreGroup + YOB + GDP + Market
    Distribution = Binomial

Estimated Coefficients:
                               Estimate        SE         tStat       pValue   
                              __________    _________    _______    ___________

    (Intercept)                  -1.6275     0.040249    -40.434              0
    ScoreGroup_Medium Risk      -0.26616     0.015304    -17.392     9.4854e-68
    ScoreGroup_Low Risk         -0.46622     0.017631    -26.443    4.3347e-154
    YOB                         -0.11399     0.005209    -21.884    3.7215e-106
    GDP                         -0.04152     0.015646    -2.6537      0.0079608
    Market                    -0.0029277    0.0011321    -2.5861      0.0097068


388097 observations, 388091 error degrees of freedom
Dispersion: 1
Chi^2-statistic vs. constant model: 1.63e+03, p-value = 0

Validate Model

Use modelDiscrimination to view the area under ROC curve (AUROC) metric for different segments of the validation data. When ShowDetails = true, you have three extra columns in the DiscMeasure output: Segment, SegmentCount, and WeightedCount. Segment shows the segmentation variable value corresponding to the given row. SegmentCount gives the number of data points contained by the given segment, while WeightedCount shows the sum of the weights associated with the segment's data. The default weight for each row is 1, so if WeightsVar is not specified or doesn't exist in the validation data set, then WeightedCount is equal to SegmentCount.

DataSetChoice = "Testing";
if DataSetChoice=="Training"
    Ind = TrainDataInd;
else
    Ind = TestDataInd;
end

DiscMeasure = modelDiscrimination(pdModel,data(Ind,:),SegmentBy="ScoreGroup",ShowDetails=true)

DiscMeasure=3×4 table
                                       AUROC        Segment       SegmentCount    WeightedCount
                                      _______    _____________    ____________    _____________

    Probit, ScoreGroup=High Risk      0.64562    "High Risk"         84242            74228    
    Probit, ScoreGroup=Medium Risk    0.62503    "Medium Risk"       87397            77172    
    Probit, ScoreGroup=Low Risk       0.63367    "Low Risk"          86988            76910

disp(DiscMeasure)

                                       AUROC        Segment       SegmentCount    WeightedCount
                                      _______    _____________    ____________    _____________

    Probit, ScoreGroup=High Risk      0.64562    "High Risk"         84242            74228    
    Probit, ScoreGroup=Medium Risk    0.62503    "Medium Risk"       87397            77172    
    Probit, ScoreGroup=Low Risk       0.63367    "Low Risk"          86988            76910

Use modelDiscriminationPlot to visualize the ROC curve. The plotted curve accounts for the specified weights.

modelDiscriminationPlot(pdModel,data(Ind,:),SegmentBy="ScoreGroup")

Figure contains an axes object. The axes object with title ROC Segmented by ScoreGroup, xlabel Fraction of Non-Defaulters, ylabel Fraction of Defaulters contains 3 objects of type line. These objects represent Probit, High Risk, AUROC = 0.64562, Probit, Medium Risk, AUROC = 0.62503, Probit, Low Risk, AUROC = 0.63367.

Use modelCalibration to evaluate the model performance. The modelCalibration function requires a grouping variable and compares the observed weighted default rate in the group with the weighted average predicted PD for the group.

[CalMeasure, CalData] = modelCalibration(pdModel,data(Ind,:),{'YOB','ScoreGroup'});
disp(CalMeasure)

                                            RMSE   
                                          _________

    Probit, grouped by YOB, ScoreGroup    0.0011458

The CalData output also contains a WeightedCount column that is similar to DiscMeasure and shows the sum of the weights associated with the given group. The default weight is 1 for each row, so if WeightsVar is unspecified, or if the variable does not exist in the validation set, WeightedCount is equal to GroupCount.

disp(CalData)

     ModelID      YOB    ScoreGroup         PD        GroupCount    WeightedCount
    __________    ___    ___________    __________    __________    _____________

    "Observed"     1     High Risk        0.030861      13084           10220    
    "Observed"     1     Medium Risk      0.013521      12998           10154    
    "Observed"     1     Low Risk        0.0081327      12646          9879.8    
    "Observed"     2     High Risk        0.022938      12567           10224    
    "Observed"     2     Medium Risk      0.012437      12767           10391    
    "Observed"     2     Low Risk        0.0046497      12478           10156    
    "Observed"     3     High Risk        0.017818      12067           10223    
    "Observed"     3     Medium Risk     0.0093478      12520           10613    
    "Observed"     3     Low Risk        0.0058731      12386           10500    
    "Observed"     4     High Risk        0.018711      11798           10410    
    "Observed"     4     Medium Risk     0.0094983      12325           10881    
    "Observed"     4     Low Risk        0.0044163      12295           10857    
    "Observed"     5     High Risk        0.016317      11481           10551    
    "Observed"     5     Medium Risk     0.0080286      12120           11145    
    "Observed"     5     Low Risk        0.0041782      12217           11236    
    "Observed"     6     High Risk       0.0096414      11250           10770    
    "Observed"     6     Medium Risk     0.0054967      11996           11491    
    "Observed"     6     Low Risk        0.0031086      12138           11629    
    "Observed"     7     High Risk       0.0058197       7937          7773.6    
    "Observed"     7     Medium Risk     0.0032354       8334          8159.8    
    "Observed"     7     Low Risk        0.0015307       8459          8283.6    
    "Observed"     8     High Risk       0.0022178       4058            4058    
    "Observed"     8     Medium Risk     0.0009223       4337            4337    
    "Observed"     8     Low Risk       0.00068666       4369            4369    
    "Probit"       1     High Risk        0.027597      13084           10220    
    "Probit"       1     Medium Risk      0.014522      12998           10154    
    "Probit"       1     Low Risk         0.008584      12646          9879.8    
    "Probit"       2     High Risk        0.021447      12567           10224    
    "Probit"       2     Medium Risk      0.011013      12767           10391    
    "Probit"       2     Low Risk        0.0063911      12478           10156    
    "Probit"       3     High Risk        0.019195      12067           10223    
    "Probit"       3     Medium Risk     0.0097721      12520           10613    
    "Probit"       3     Low Risk          0.00563      12386           10500    
    "Probit"       4     High Risk        0.018073      11798           10410    
    "Probit"       4     Medium Risk     0.0091654      12325           10881    
    "Probit"       4     Low Risk        0.0052668      12295           10857    
    "Probit"       5     High Risk        0.014643      11481           10551    
    "Probit"       5     Medium Risk        0.0072      12120           11145    
    "Probit"       5     Low Risk        0.0040669      12217           11236    
    "Probit"       6     High Risk        0.010323      11250           10770    
    "Probit"       6     Medium Risk     0.0049299      11996           11491    
    "Probit"       6     Low Risk        0.0027131      12138           11629    
    "Probit"       7     High Risk       0.0063338       7937          7773.6    
    "Probit"       7     Medium Risk      0.002904       8334          8159.8    
    "Probit"       7     Low Risk        0.0015449       8459          8283.6    
    "Probit"       8     High Risk       0.0040971       4058            4058    
    "Probit"       8     Medium Risk     0.0018064       4337            4337    
    "Probit"       8     Low Risk       0.00093487       4369            4369

Use modelCalibrationPlot to visualize the observed weighted default rates compared to the predicted PD.

modelCalibrationPlot(pdModel,data(Ind,:),{'YOB','ScoreGroup'})

Figure contains an axes object. The axes object with title Scatter Grouped by YOB and ScoreGroup Probit, RMSE = 0.0011458, xlabel YOB, ylabel PD contains 6 objects of type line. One or more of the lines displays its values using only markers These objects represent High Risk, Observed, Medium Risk, Observed, Low Risk, Observed, High Risk, Probit, Medium Risk, Probit, Low Risk, Probit.