主要内容

probdefault

给定数据集的默认可能性

描述

实例

pd= probdefault (sc)计算的违约概率sc,用来建立信用记分卡对象。

实例

pd= probdefault (sc,数据)计算使用可选参数指定的给定数据集的默认概率数据

默认情况下,用于生成信用记分卡使用对象。您还可以提供输入数据,对其应用相同的违约概率计算。

例子

全部折叠

创建一个信用记分卡使用CreditCardData.mat文件来加载数据(使用2011年Refaat的数据集)。

负载CreditCardDatasc = creditscorecard(数据,“IDVar”,“CustID”)
sc = creditscorecard with properties: GoodLabel: 0 ResponseVar: 'status' WeightsVar: " VarNames: {1x11 cell} NumericPredictors: {1x6 cell} CategoricalPredictors: {'ResStatus' 'EmpStatus' 'OtherCC'} BinMissingData: 0 IDVar: 'CustID' PredictorVars: {1x9 cell} Data: [1200x11 table]

使用默认选项执行自动分类。默认情况下,autobinning使用单调的算法。

sc = autobinning (sc);

合适的模型。

sc = fitmodel (sc);
1.加上CustIncome, Deviance = 1490.8527, Chi2Stat = 32.588614, PValue = 1.1387992e-08添加TmWBank, Deviance = 1467.1415, Chi2Stat = 23.711203, PValue = 1.1192909e-06添加AMBalance, Deviance = 1455.5715, Chi2Stat = 11.569967, PValue = 0.00067025601增加EmpStatus, Deviance = 1447.3451, Chi2Stat = 8.2264038, PValue = 0.0041285257 5。添加CustAge, Deviance = 1441.994, Chi2Stat = 5.3511754, PValue = 0.020708306添加ResStatus, Deviance = 1437.8756, Chi2Stat = 4.118404, PValue = 0.042419078广义线性回归模型:状态~[7个预测因子中8项的线性公式]分布=二项式估计系数:Estimate SE tStat pValue ________ ________ ______ __________ (Intercept) 0.70239 0.064001 10.975 5.0538e-28 CustAge 0.60833 0.24932 2.44 0.014687 ResStatus 1.377 0.65272 2.1097 0.034888 EmpStatus 0.88565 0.293 3.0227 0.0025055 CustIncome 0.70164 0.21844 3.2121 0.0013179 TmWBank 1.1074 0.23271 4.7589 1.9464e-06 OtherCC 1.0883 0.52912 2.0569 0.039696AMBalance 1.045 0.32214 3.2439 0.0011792 1200 observations, 1192 error degrees of freedom Dispersion: 1 Chi^2-statistic vs. constant model: 89.7, p-value = 1.4e-16

计算违约概率。

pd = probdefault (sc);disp (pd (1:15)):
0.2503 0.1878 0.3173 0.1711 0.1895 0.1307 0.5218 0.2848 0.2612 0.3047 0.3418 0.2237 0.2793 0.3615 0.1653

本例描述了当“BinMissingData”选项设置为真正的,以及相应的违约概率的计算。

  • 训练集中有缺失数据的预测器有一个显式的容器< >失踪在最后的记分卡上有相应的分数。这些点数是根据证据权重(WOE)值计算的< >失踪Bin和logistic模型系数。出于记分的目的,这些分数被分配给缺失的值和超出范围的值,最后的分数被映射到使用时默认值的概率probdefault

  • 在训练集中没有缺失数据的预测因子没有< >失踪bin,因此无法从训练数据中估计WOE。默认情况下,缺失值和超出范围值的点设置为,这就导致了一系列的运行时分数.对于没有明确的预测器< >失踪Bin,使用name-value参数“失踪”格式点以指示如何处理缺失的数据进行评分。最后的分数将被映射到使用时的默认概率probdefault

创建一个信用记分卡使用CreditCardData.mat文件来加载dataMissing缺少值。

负载CreditCardData.mat头(dataMissing, 5)
ans =5×11表CustID CustAge TmAtAddress ResStatus EmpStatus CustIncome TmWBank OtherCC AMBalance UtilRate地位  ______ _______ ___________ ___________ _________ __________ _______ _______ _________ ________ ______ 53 62 <定义>未知50000 55是的1055.9 - 0.22 0 2 61 22业主雇佣52000 25是的1161.6 - 0.24 0 3 47 30租户雇佣了37000 61877.23 0.29 0 4 NaN 75自雇业主53000 20是157.37 0.08 0 5 68 56自雇业主53000 14是561.84 0.11 0

使用信用记分卡使用name-value参数“BinMissingData”设置为真正的将丢失的数字或类别数据装入单独的容器中。应用自动装箱。

sc = creditscorecard (dataMissing,“IDVar”,“CustID”,“BinMissingData”,真正的);sc = autobinning (sc);disp (sc)
creditscorecard with properties: GoodLabel: 0 ResponseVar: 'status' WeightsVar: " VarNames: {1x11 cell} NumericPredictors: {1x6 cell} CategoricalPredictors: {'ResStatus' 'EmpStatus' 'OtherCC'} BinMissingData: 1 IDVar: 'CustID' PredictorVars: {1x9 cell} Data: [1200x11 table]

设置最小值为0CustAge保管收入.这样,任何负年龄或收入信息都将失效或“超出范围”。对于评分和默认计算的概率,超出范围的值与缺失值给出相同的分数。

sc = modifybins (sc,“CustAge”,“MinValue”, 0);sc = modifybins (sc,“收入”,“MinValue”, 0);

显示bin信息的数字数据“CustAge”这包括在一个单独的标签箱中丢失的数据< >失踪

bi = bininfo (sc,“CustAge”);disp (bi)
本好不好悲哀InfoValue几率  _____________ ____ ___ ______ ________ __________ {'[ 0, 33) 52} 69 1.3269 -0.42156 0.018993{[33岁,37)}63年45 1.4 -0.36795 0.012839{[37、40)}72年47 1.5319 -0.2779 0.0079824{'[40岁,46)}172 89 1.9326 -0.04556 0.0004549{'[46岁,48)}59 25 2.36 0.15424 0.0016199{[48,51)}99年41 2.4146 0.17713 0.0035449{'[51,58)'} 157 62 2.5323 0.22469 0.0088407 {'[58,Inf]'} 93 25 3.72 0.60931 0.032198 {''} 19 11 1.7273 -0.15787 0.00063885 {' total '} 803 397 2.0227 NaN 0.087112

显示的分类数据的bin信息“雷斯塔特斯”这包括在一个单独的标签箱中丢失的数据< >失踪

bi = bininfo (sc,“雷斯塔特斯”);disp (bi)
(UUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUU27 13 2.0769 0.026469 2.3248e-05{‘总计}803 397 2.0227 NaN 0.0092627

“CustAge”“雷斯塔特斯”预测者,有缺失的数据(年代和<定义>),对于这些预测器中的缺失数据,分箱过程估计出的WOE值分别为-0.15787和0.026469,如上所示。

EMP状态保管收入缺少值没有明确的bin,因为训练数据没有这些预测值的缺少值。

bi = bininfo (sc,“EmpStatus”);disp (bi)
本好不好悲哀InfoValue几率  ____________ ____ ___ ______ ________ _________ {' 未知的}396 239 1.6569 -0.19947 0.021715{“雇佣”}407 158 2.5759 0.2418 0.026323{“总数”}803 397 0.048038 2.0227南
bi = bininfo (sc,“收入”);disp (bi)
(2)UUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUOUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUU843 0.00041042{'[35000400000'}193 98 1.9694-0.026696 0.00017359{'[4000042000'}68 34 2-0.011271 1.0819e-05{'[4200047000][164662.48480.205790.0078175{'[47000,Inf]}183563.26790.47972 0.041657{'Totals'}8033972.0227 NaN 0.12285

使用fitmodel利用证据权重(WOE)数据拟合逻辑回归模型。fitmodel在内部将所有预测变量转换为祸值,使用自动装箱过程中找到的箱子。fitmodel然后使用逐步方法(默认)拟合逻辑回归模型。对于缺少数据的预测器,有一个明确的< >失踪bin,并根据数据计算出相应的WOE值。当使用fitmodel,执行WOE转换时,将应用bin的相应WOE值。

(sc, mdl) = fitmodel (sc);
1.加上CustIncome, Deviance = 1490.8527, Chi2Stat = 32.588614, PValue = 1.1387992e-08添加TmWBank, Deviance = 1467.1415, Chi2Stat = 23.711203, PValue = 1.1192909e-06添加AMBalance, Deviance = 1455.5715, Chi2Stat = 11.569967, PValue = 0.00067025601增加EmpStatus, Deviance = 1447.3451, Chi2Stat = 8.2264038, PValue = 0.0041285257 5。加CustAge, Deviance = 1442.8477, Chi2Stat = 4.4974731, PValue = 0.033944979添加ResStatus, Deviance = 1438.9783, Chi2Stat = 3.86941, PValue = 0.049173805广义线性回归模型:status ~[7个预测因子中8项的线性公式]Distribution = Binomial Estimated Coefficients:Estimate SE tStat pValue ________ ________ ______ __________ (Intercept) 0.70229 0.063959 10.98 4.7498e-28 CustAge 0.57421 0.25708 2.2335 0.025513 ResStatus 1.3629 0.66952 2.0356 0.04179 EmpStatus 0.88373 0.2929 3.0172 0.002551 CustIncome 0.73535 0.2159 3.406 0.00065929 TmWBank 1.1065 0.23267 4.7556 1.9783e-06 OtherCC 1.0648 0.52826 2.0156 0.043841AMBalance 1.0446 0.32197 3.2443 0.0011775 1200 observations, 1192 error degrees of freedom Dispersion: 1 Chi^2-statistic vs. constant model: 88.5, p-value = 2.55e-16

通过“点数、赔率和点数加倍赔率(PDO)”方法,使用“PointsOddsAndPDO”的观点格式点。假设您希望500分的分数的赔率为2(好的可能性是坏的可能性的两倍),并且赔率每50分翻倍(因此550分的赔率为4)。

显示显示在拟合模型中保留的预测器按比例缩放的点数的记分卡。

sc=格式点(sc,“PointsOddsAndPDO”, (500 2));PointsInfo = displaypoints (sc)
PointsInfo =38×3表预测本点  _____________ ______________ ______ {' CustAge’}{[0,33)的54.062}{‘CustAge}{[33岁,37)的56.282}{‘CustAge}{[37、40)的60.012}{‘CustAge}{[40岁,46)的69.636}{‘CustAge}{[46岁,48)的77.912}{‘CustAge}{[48, 51)的78.86}{‘CustAge}{[51岁,58)的80.83}{‘CustAge}{[58岁的Inf]的}96.76{‘CustAge}{< >失踪的}64.984 {'EmpStatus'} {'EmpStatus'} {'Home Owner'} 73.248 {'ResStatus'} {'Other'} 90.828 {'ResStatus'} {''} 74.125 {'EmpStatus'} {'Unknown'} 58.807 {'EmpStatus'} {'Employed'} 86.937 {'EmpStatus'} {''}⋮

注意这个点< >失踪本为CustAgeResStatus显式显示(如下所示)64.983674.1250分别)。这些点是根据<缺失> bin的WOE值和logistic模型系数计算出来的。

对于训练集中没有缺失数据的预测值,没有明确的bin。默认情况下,点设置为因为缺少数据,他们会导致运行时分数.对于没有显式 bin的预测器,使用name-value参数“失踪”格式点以指示如何处理缺失的数据进行评分。

出于演示的目的,从原始数据中提取几行作为测试数据,并引入一些缺失的数据。还引入一些无效或超出范围的值。对于数字数据,低于允许的最小值(或高于允许的最大值)的值被认为是无效的,例如年龄的负值(回忆一下)“MinValue”之前设置为0CustAge保管收入)。对于分类数据,无效值是未明确包含在记分卡中的类别,例如,以前未映射到记分卡类别的居住状态,如“House”,或无意义的字符串,如“abc123”。

tdata=数据缺失(11:18,mdl.Predictor名称);%只保留模型中的预测器%设置一些缺失的值tdata.CustAge(1) =南;tdata.ResStatus (2) =' <定义> '; tdata.EMP状态(3)=' <定义> ';tdata.CustIncome(4) =南;%设置一些无效值tdata.CustAge (5) = -100;tdata.ResStatus (6) =“房子”;tdata.EmpStatus (7) =“自由职业者”;tdata.CustIncome(8)=-1;disp(tdata)
(UUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUU35号436.41-100其他雇佣46000 16是162.21 33房子就业36000 36是845.02 39租户自由职业者34000 40是756.26 24业主就业-1 19是449.61

对新数据进行评分,并查看如何为缺失打分CustAgeResStatus,因为我们有一个显式的为< >失踪.然而,对于EMP状态保管收入这个分数函数将点设置为.相应的违约概率也被设置为

(分数,分)=分数(sc tdata);disp(分数)
481.2231520.8353楠楠楠551.7922 487.9588楠楠楠楠
disp(分)
CustAge ResStatus EmpStatus CustIncome TmWBank OtherCC AMBalance _______ _________ _________ __________ _______ _______ _________ 64.984 62.138 58.807 67.893 61.858 75.622 89.922 78.86 74.125 58.807 82.439 61.061 75.622 89.922 96.76 73.248 NaN 96.969 51.132 50.914 89.922 69.636 90.828 58.807 61.858 50.914 89.922 64.984 90.828 86.937 82.43961.061 75.622 89.922 56.282 74.125 86.937 70.107 61.858 75.622 63.028 60.012 62.138 NaN 67.893 61.858 75.622 63.028 54.062 73.248 86.937
pd = probdefault (sc, tdata);disp (pd)
0.3934 0.2725楠楠楠0.1961 0.3714楠楠楠楠

使用name-value参数“失踪”格式点选择如何为没有显式指示的预测器的缺失值分配点< >失踪箱子在本例中,使用“MinPoints”选择“失踪”论点最低分EMP状态在上面显示的记分卡中58.8072,对于保管收入最小值点是29.3753.现在所有的行都有一个分数和相应的违约概率。

sc=格式点(sc,“失踪”,“MinPoints”); [得分,得分]=得分(sc,tdata);显示(分数)
481.2231 520.8353 517.7532 451.3405 551.7922 487.9588 449.3577 470.2267
disp(分)
CustAge ResStatus EmpStatus CustIncome TmWBank OtherCC AMBalance _______ _________ _________ __________ _______ _______ _________ 64.984 62.138 58.807 67.893 61.858 75.622 89.922 78.86 74.125 58.807 82.439 61.061 75.622 89.922 96.76 73.248 58.807 96.969 51.132 50.914 89.922 69.636 90.828 58.807 29.375 61.858 50.914 89.922 64.984 90.828 86.93782.439 61.061 75.622 89.922 56.282 74.125 86.937 70.107 61.858 75.622 63.028 60.012 62.138 58.807 67.893 61.858 75.622 63.028 54.062 73.248 86.937 29.375 61.061 75.622 89.922
pd = probdefault (sc, tdata);disp (pd)
0.3934 0.2725 0.2810 0.4954 0.1961 0.3714 0.5022 0.4304

输入参数

全部折叠

信用记分卡模型,指定为信用记分卡对象。要创建此对象,请使用信用记分卡

(可选)应用默认规则的概率数据集,指定为MATLAB®表,其中每一行对应于单个观察。属性中的每个预测器的数据必须包含列信用记分卡对象。

数据类型:表格

输出参数

全部折叠

违约概率,返回为暴民——- - - - - -1.默认概率的数值数组。

更多关于

全部折叠

违约概率

在未按比例计算分数之后(见计算和缩放分数的算法),点“良好”的概率由以下公式表示:

ProbGood = 1。/ (1 + exp (-UnscaledScores))

因此,违约概率为

pd = 1 - ProbGood

参考文献

[1] Refaat, M。信用风险记分卡:使用SAS开发和实施。lulu.com, 2011。

介绍了R2015a