这个例子描述了当“BinMissingData”
选项设置为真正的
.
训练集中缺少数据的预测值有一个明确的bin
在最终记分卡中有相应的分数。这些分数是根据
bin和逻辑模型系数。为了评分,这些点被分配给缺失值和超出范围的值。
训练集中没有缺失数据的预测器没有缺失数据
bin,因此不能从训练数据中估计出WOE。默认情况下,缺失值和超出范围值的点被设置为南
,这就导致了一系列的南
运行时分数
.对于没有明确定义的预测值
bin,使用name-value参数“失踪”
在formatpoints
以指示如何处理缺失的数据进行评分。
创建一个creditscorecard
对象使用CreditCardData.mat
要加载的文件dataMissing
用缺失值。
ans=5×11表(UUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUU乌乌乌乌乌乌乌乌乌乌乌乌乌乌乌乌乌乌乌乌乌乌乌乌乌乌乌乌乌乌乌乌乌乌乌乌乌乌乌乌乌乌乌乌乌乌乌乌乌乌乌乌乌乌乌乌乌乌乌乌乌乌乌乌乌乌乌乌乌乌乌乌乌乌乌乌乌乌乌乌乌乌乌乌乌乌乌乌乌乌乌乌乌乌乌乌乌乌乌乌乌乌乌乌乌乌乌乌乌乌乌乌乌乌乌乌乌乌乌乌乌乌乌乌乌乌乌乌77.23 0.29 0 4南75业主雇佣53000 20是157.37 0.08 0 5 68 56业主雇佣53000 14是561.84 0.11 0
使用creditscorecard
使用name-value参数“BinMissingData”
设置为真正的
将丢失的数字或分类数据存储在单独的存储箱中。应用自动存储箱。
具有属性的creditscorecard:GoodLabel:0 ResponseVar:'status'WeightsVar:'VarNames:{1x11 cell}数值预测值:{1x6 cell}分类预测值:{'ResStatus''EmpStatus''OtherCC'}BinMissingData:1 IDVar:'CustID'预测值:{1x9 cell}数据:[1200x11表格]
设置最小值为零保管
和CustIncome
.这样,任何负年龄或收入信息都将失效或“超出范围”。出于评分目的,超出范围的值与缺失值给出相同的分数。
显示和绘图箱信息的数字数据“保管”
这包括在一个单独的标签箱中丢失的数据
.
(UUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUU46)“172891.9326-0.04556 0.0004549{'[46,48'}59 25 2.36 0.15424 0.0016199{'[48,51'}99 41 2.4146 0.17713 0.0035449{”[51,58)}15762.53230.224690.0088407{'[58,Inf]}93253.720.609310.032198{'}1911.7273-0.15787 0.00063885{'Totals'}8033972.0227 NaN 0.087112
显示和绘图的分类数据的bin信息“ResStatus”
这包括在一个单独的标签箱中丢失的数据
.
Bin Good Bad Odds WOE InfoValue ______________ _____________ _________ __________ {'Tenant'} 296 161 1.8385 -0.095463 0.0035249 {'Home Owner'} 352 171 2.0585 0.017549 0.00013382 {'Other'} 128 52 2.4615 0.19637 0.0055808 {''} 27 13 2.0769 0.026469 2.3248e-05 {' total '} 803 397 2.0227 NaN 0.0092627
对于“保管”
和“ResStatus”
预测,缺少数据(南
s和<定义>
)在训练数据中,binning过程估计的WOE值为-0.15787
和0.026469
分别针对这些预测值中的缺失数据,如上所示。
为EmpStatus
和CustIncome
由于训练数据中没有这些预测器的缺失值,因此没有明确的缺失值存放。
本好不好悲哀InfoValue几率 ____________ ____ ___ ______ ________ _________ {' 未知的}396 239 1.6569 -0.19947 0.021715{“雇佣”}407 158 2.5759 0.2418 0.026323{“总数”}803 397 0.048038 2.0227南
本好不好悲哀InfoValue几率 _________________ ____ ___ _______ _________ __________ {'[ 0, 29000)} 53 58 0.91379 -0.79457 0.06364{[29000、33000)}74年49 1.5102 -0.29217 0.0091366{[33000、35000)的36}68 1.8889 -0.06843 0.00041042{[35000、40000)的}193 98 1.9694 -0.026696 0.00017359{[40000、42000)}68 2 -0.011271 - 1.0819 e-05 34{'[42000,47000)'} 164 66 2.4848 0.20579 0.0078175 {'[47000,Inf]'} 183 56 3.2679 0.47972 0.041657 {' total '} 803 397 2.0227 NaN 0.12285
使用fitmodel
利用证据权重(WOE)数据拟合逻辑回归模型。fitmodel
使用自动装箱过程中找到的箱子,在内部将所有预测变量转换为WOE值。fitmodel
然后使用逐步方法(默认情况下)拟合逻辑回归模型。对于缺少数据的预测值,有一个明确的
bin,并根据数据计算出相应的WOE值。当使用fitmodel
,则为
bin在执行WOE转换时被应用。
1.加上CustIncome,偏差=1490.8527,Chi2Stat=32.588614,PValue=1.1387992e-08 2。添加TmWBank,偏差=1467.1415,Chi2Stat=23.711203,PValue=1.1192909e-06 3。添加AMBalance,偏差=1455.5715,Chi2Stat=11.569967,PValue=0.00067025601 4。添加EmpStatus,偏差=1447.3451,Chi2Stat=8.2264038,PValue=0.0041285257 5。加上保管费,偏差=1442.8477,Chi2Stat=4.4974731,PValue=0.033944979 6。加上ResStatus,偏差=1438.9783,Chi2Stat=3.86941,PValue=0.049173805 7。加上其他Cc,偏差=1434.9751,Chi2Stat=4.0031966,PValue=0.045414057广义线性回归模型:状态~[7个预测值中包含8项的线性公式]分布=二项式估计系数:估计统计PValue uuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuu0.70229 0.063959 10.98 4.7498e-28保管0.57421 0.25708 2.2335 0.025513 ResStatus 1.3629 0.66952 2.0356 0.04179 EmpStatus 0.88373 0.2929 3.0172 0.002551保管收入0.73535 0.2159 3.406 0.00065929 TmWBank 1.1065 0.23267 4.7556 1.9783e-06其他CC 1.0648 0.52826 2.0156 0.043846安巴兰斯1.2497观察结果,1192误差自由度离散度:1 Chi^2-统计与常数模型:88.5,p值=2.55e-16
通过“点数、几率和几率加倍(PDO)”方法来衡量记分卡点数“PointsOddsAndPDO”
论据formatpoints
.假设你想要获得500分的概率是2(好的概率是坏的概率的两倍),概率每50分翻一倍(所以550分的概率是4)。
显示记分卡,显示拟合模型中保留的预测值的缩放点。
PointsInfo =38×3表UUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUU卢卢卢卢卢卢卢卢卢卢卢卢卢卢卢卢卢卢卢卢卢卢卢卢卢卢卢卢卢卢卢卢卢卢卢卢卢卢卢卢卢卢卢卢卢卢卢卢卢卢卢卢卢卢卢卢卢卢卢卢卢卢卢卢卢卢卢卢卢卢卢卢卢卢卢卢卢卢卢卢卢卢卢卢卢卢卢卢卢卢卢卢卢卢卢卢卢卢卢卢卢卢卢卢卢卢卢卢卢卢卢卢卢卢卢卢卢卢卢卢卢卢卢卢卢卢卢卢卢卢卢卢卢卢卢卢卢卢卢卢卢卢{'[48,51'}78.86{'[51,58'}80.83{''CustAge'}{'[58,Inf]'}96.76{'CustAge'}{}64.984{'ResStatus'}{'ResStatus'}{'Tenant'}62.138{'ResStatus'}{'Home Owner'}73.248{'ResStatus'}{'Other'}90.828{'ResStatus'}{'missing>}74.125{'EmpStatus'}{'Unknown'}58.807{'EmpStatus'}{'EmpStatus'}{'EmpStatus'}{'⋮
注意这个点
垃圾箱保管
和ResStatus
显式显示(如64.9836
和74.1250
分别)。这些点是根据该地区的WOE值计算出来的
和logistic模型系数。
对于在训练集中没有缺失数据的预测值,没有明确的定义
箱子默认情况下,点设置为南
因为缺少数据,他们会导致南
运行时分数
.对于没有明确定义的预测值
bin,使用name-value参数“失踪”
在formatpoints
以指示如何处理缺失的数据进行评分。
出于演示的目的,从原始数据中提取几行作为测试数据,并引入一些缺失的数据。还要引入一些无效的或超出范围的值。对于数字数据,低于允许的最小值(或高于允许的最大值)的值被认为是无效的,例如年龄的负值(回忆一下)“MinValue”
早些时候被设定为0
为保管
和CustIncome
).对于分类数据,无效值是没有显式地包含在记分卡中的类别,例如,以前没有映射到记分卡类别的居住状态,如“House”,或没有意义的字符串,如“abc123”。
CustAge ResStatus EmpStatus CustIncome TmWBank OtherCC AMBalance _______ ___________ ___________ __________ _______ _______ _________ 南租户未知34000 44是的119.8 48 <定义>未知44000 14是的403.62 65房主<定义> 48000年6没有其它未知南35 111.88 44 436.41 -100其他雇用了46000名16是的162.21 33家36000 36 Yes 845.02 39 Tenant Freelancer 34000 40 Yes 756.26 24 Home Owner Employed -1 19 Yes 449.61
对新数据进行评分,并查看如何为缺失打分保管
和ResStatus
,因为我们有一个带有点的显式bin
.但是EmpStatus
和CustIncome
的分数
函数将点设置为南
.
481.2231 520.8353 NaN NaN 551.7922 487.9588 NaN NaN
CustAge ResStatus EmpStatus CustIncome TmWBank OtherCC AMBalance _______ _________ _________ __________ _______ _______ _________ 64.984 62.138 58.807 67.893 61.858 75.622 89.922 78.86 74.125 58.807 82.439 61.061 75.622 89.922 96.76 73.248 NaN 96.969 51.132 50.914 89.922 69.636 90.828 58.807 61.858 50.914 89.922 64.984 90.828 86.937 82.43961.061 75.622 89.922 56.282 74.125 86.937 70.107 61.858 75.622 63.028 60.012 62.138 NaN 67.893 61.858 75.622 63.028 54.062 73.248 86.937
使用name-value参数“失踪”
在formatpoints
选择如何为没有明确定义的预测值的缺失值指定点
箱子在本例中,使用“明点”
选择“失踪”
论点。最低分数EmpStatus
在上面显示的记分卡中58.8072
,及CustIncome
最低分数为29.3753
.
481.2231 520.8353 517.7532 451.3405 551.7922 487.9588 449.3577 470.2267
CustAge ResStatus EmpStatus CustIncome TmWBank OtherCC AMBalance _______ _________ _________ __________ _______ _______ _________ 64.984 62.138 58.807 67.893 61.858 75.622 89.922 78.86 74.125 58.807 82.439 61.061 75.622 89.922 96.76 73.248 58.807 96.969 51.132 50.914 89.922 69.636 90.828 58.807 29.375 61.858 50.914 89.922 64.984 90.828 86.93782.439 61.061 75.622 89.922 56.282 74.125 86.937 70.107 61.858 75.622 63.028 60.012 62.138 58.807 67.893 61.858 75.622 63.028 54.062 73.248 86.937 29.375 61.061 75.622 89.922