r - calibration of the posterior probabilities -
currently work on calibration of probability. use calibration approach, called rescaling algorithm
- source http://lem.cnrs.fr/portals/2/actus/dp_201106.pdf (page 7).
the algorithm wrote is:
rescaling_fun = function(x, y, z) { p_korg = z # yhat_test_prob$bad p_k_c1 = sum(as.numeric(y) - 1)/length(y) # testset$bad p_kt_c1 = sum(as.numeric(x) - 1)/length(x) # trainset$bad p_k_c0 = sum(abs(as.numeric(y) - 2))/length(y) p_kt_c0 = sum(abs(as.numeric(x) - 2))/length(x) p_new <- ((p_k_c1/p_kt_c1) * p_korg)/((p_k_c0/p_k_c0) * (1 - p_korg) + (p_k_c0/p_k_c1) * (p_korg)) return(p_new) }
the input values are:
1. x - train_set$bad (actuals of `train set`) 2. y - test_set$bad (actuals of `test set`) 3. z - yhat_test_prob$bad (prediction on `test set`)
the problem - result values not within range of 0
, 1
. please solve problem?
your formulas obtain probs (p_k_c1
...) need modified. example, according paper, y binary variable (0, 1) , formula sum(y - 1)/length(y)
negative - converts y values -1 or 0, followed adding them. consider should (sum(y)-1)/length(y)
. below example.
set.seed(1237) y <- sample(0:1, 10, replace = t) y [1] 0 1 0 0 0 1 1 0 1 1 # must negative sum(y - 1) - y 0 or 1 sum(as.numeric(y) - 1)/length(y) [1] -0.5 # modification (sum(as.numeric(y)) - 1)/length(y) [1] 0.4
Comments
Post a Comment