Monotone (or isotone) regression via the isotone
package (also give scam
and gbm
var.monotone
a look, which should have the advantage of also being low complexity).
We will use the vtreat
package custom coder interface, which will supply cross-validated significance calculations and out-of sample interpolation (allowing us to apply the monotone transforms to new data). For a more substantial application of vtreat
custom coding please see the partial pooling application.
suppressPackageStartupMessages(library("ggplot2")) library("seplyr") source("isotone.R") # set up example data set.seed(23525) d <- data.frame(x = 10*runif(200)) d$yIdeal <- d$x^2 d$yObserved <- d$yIdeal + 25*rnorm(nrow(d)) d$isTrain <- runif(nrow(d))<=0.5 ggplot(data=d, aes(x=x)) + geom_line(aes(y=yIdeal), color='blue', linetype=2) + geom_point(aes(y=yObserved, color=isTrain, shape=isTrain)) + ylab('y') + ggtitle("ideal and observed responses as functions of x", subtitle = "dashed curve: ideal (pre-noise) values") customCoders = list('n.NonDecreasingV.num' = solveNonDecreasing, 'n.NonIncreasingV.num' = solveNonIncreasing) treatments <- vtreat::designTreatmentsN(d[d$isTrain, , drop=FALSE], 'x', 'yObserved', customCoders = customCoders, verbose = FALSE) print(treatments$scoreFrame[, c('varName', 'rsq', 'sig', 'needsSplit'), drop=FALSE]) dTreated <- vtreat::prepare(treatments, d) d$soln <- dTreated$x_NonDecreasingV dTrain <- d[d$isTrain, , drop=FALSE] # good inference on train sum((dTrain$yIdeal - dTrain$soln)^2) sum((dTrain$yIdeal - dTrain$yObserved)^2) dTest <- d[!d$isTrain, , drop=FALSE] # good performance on test sum((dTest$yIdeal - dTest$soln)^2) sum((dTest$yIdeal - dTest$yObserved)^2) ggplot(data=d, aes(x=x)) + geom_line(aes(y=yIdeal), color='blue', linetype=2) + geom_point(aes(y=yObserved, color=isTrain, shape=isTrain)) + geom_line(aes(x=x, y=soln), color='darkgreen') + ylab('y') + ggtitle("ideal and observed responses as functions of x", subtitle = "solid path: isotone fit")
The above formulation is kind of exciting. You get one degree of freedom per data-row (a very large number), but a simple constraint system (that the produced predictions must follow the x-order constraints) is enough to produce reasonable fits. This reminiscent of the maximum entropy formulation of logistic regression, and is evidence one is working with a sort of dual-formulation of a smaller primal problem.
Some notes on smoother implementations can be found here.
We can also easily adapt to classification and to categorical inputs.
source("isotone.R") # set up example data set.seed(23525) d <- data.frame(x = 10*runif(200)) d$yIdeal <- -d$x^2 d$yObserved <- d$yIdeal + 25*rnorm(nrow(d)) d$isTrain <- runif(nrow(d))<=0.5 threshold <- -50 d$yIdeal <- d$yIdeal >= threshold d$yObserved <- d$yObserved >= threshold # could also build link-space versions customCoders = list('c.NonDecreasingV.num' = solveNonDecreasing, 'c.NonIncreasingV.num' = solveNonIncreasing) treatments <- vtreat::designTreatmentsC(d[d$isTrain, , drop=FALSE], 'x', 'yObserved', TRUE, customCoders = customCoders, verbose = FALSE) # examining variables print(treatments$scoreFrame[, c('varName', 'rsq', 'sig', 'needsSplit'), drop=FALSE]) # copy fit over to original data frame dTreated <- vtreat::prepare(treatments, d) d$soln <- dTreated$x_NonIncreasingV d$prediction <- d$soln>0.5 dTrain <- d[d$isTrain, , drop=FALSE] # good inference on train sigr::wrapChiSqTest(dTrain, 'soln', 'yIdeal') table(dTrain$prediction, dTrain$yIdeal) sigr::wrapFisherTest(dTrain, 'prediction', 'yIdeal') table(dTrain$yObserved, dTrain$yIdeal) sigr::wrapFisherTest(dTrain, 'yObserved', 'yIdeal') dTest <- d[!d$isTrain, , drop=FALSE] # good performance on test sigr::wrapChiSqTest(dTest, 'soln', 'yIdeal') table(dTest$prediction, dTest$yIdeal) sigr::wrapFisherTest(dTest, 'prediction', 'yIdeal') table(dTest$yObserved, dTest$yIdeal) sigr::wrapFisherTest(dTest, 'yObserved', 'yIdeal')
One application we have used the monotone methodology with good success is: calibrating regressions and classifiers.
This is an idea that came up in discussion with Jeremy Howard. Jeremy, while President and Chief Scientist of Kaggle, decided AUC was a good "early to see if you have something" metric for running contests. That leaves the question: what is a principled mechanical way to convert a score with a good AUC to a score with correct probabilities (a good deviance). Kaggle participants are famous for good solutions. But here is our solution: a arbitrary isotone transform.
That is: we take a model that does well on the AUC
measure (meaning it is good at ranking or reproducing order
relations) and build the best model with the same order structure with respect to a more stringent measure (such as sum of squared errors, or deviance). Often this step is ignored or done by binning or some other method- but for systems that are not natively in probability units (such as margin based systems such as support vector machines) this isotone calibration or
polish step can be an improvement (assuming one is careful about nested model bias issues).
We can try- that. Suppose we forgot to set type="response"
on a logistic regression and we didn't know the link function is the sigmoid (so we can't directly apply the correction).
source("isotone.R") # set up example data set.seed(23525) d <- data.frame(x = 10*runif(200)) d$yIdeal <- -d$x^2 d$yObserved <- d$yIdeal + 25*rnorm(nrow(d)) d$isTrain <- runif(nrow(d))<=0.5 threshold <- -50 d$yIdeal <- d$yIdeal >= threshold d$yObserved <- d$yObserved >= threshold model <- glm(yObserved ~ x, data = d[d$isTrain, , drop=FALSE], family = binomial) sigr::wrapChiSqTest(model) d$rawScore <- predict(model, newdata = d) # oops forgot type='response' head(d) # fix it with isotone regression customCoders = list('c.NonDecreasingV.num' = solveNonDecreasing) treatments <- vtreat::designTreatmentsC(d[d$isTrain, , drop=FALSE], 'rawScore', 'yObserved', TRUE, customCoders = customCoders, verbose = FALSE) print(treatments$scoreFrame[, c('varName', 'rsq', 'sig', 'needsSplit'), drop=FALSE]) predictionCollar <- 2/(sum(d$isTrain)-1)
Notice in the score frame vtreat
's cross validation based scoring correctly
indentifies that the isotone encoding is over-fitting and not in fact better than
the sigmoid link function.
# copy fit over to original data frame dTreated <- vtreat::prepare(treatments, d) d$adjScore <- dTreated$rawScore_NonDecreasingV d$adjScore <- pmax(predictionCollar, pmin(1-predictionCollar, d$adjScore)) d$adjPred <- d$adjScore>=0.5 # and the correct link d$linkScore <- predict(model, newdata = d, type = 'response') d$linkPred <- d$linkScore>=0.5 head(d) dTest <- d[!d$isTrain, , drop=FALSE]
sigr::wrapChiSqTest(dTest, 'adjScore', 'yObserved') dTest %.>% group_by_se(., "yObserved") %.>% summarize_nse(., minAdj := min(adjScore), maxAdj := max(adjScore)) WVPlots::DoubleDensityPlot(dTest, 'adjScore', 'yObserved', "adjusted prediction against observations")
sigr::wrapChiSqTest(dTest, 'linkScore', 'yObserved') dTest %.>% group_by_se(., "yObserved") %.>% summarize_nse(., minAdj := min(linkScore), maxAdj := max(linkScore)) WVPlots::DoubleDensityPlot(dTest, 'linkScore', 'yObserved', "link prediction against observations")
sigr::wrapChiSqTest(dTest, 'adjScore', 'yIdeal') dTest %.>% group_by_se(., "yIdeal") %.>% summarize_nse(., minAdj := min(adjScore), maxAdj := max(adjScore)) WVPlots::DoubleDensityPlot(dTest, 'adjScore', 'yIdeal', "adjusted prediction against ideal (unobserved) concept")
sigr::wrapChiSqTest(dTest, 'linkScore', 'yIdeal') dTest %.>% group_by_se(., "yIdeal") %.>% summarize_nse(., minAdj := min(linkScore), maxAdj := max(linkScore)) WVPlots::DoubleDensityPlot(dTest, 'linkScore', 'yIdeal', "link prediction against ideal (unobserved) concept")
And we see the adjusted prediction is pretty good, even with the nested model bias issue. In fact even though it shows sings of over-fit on the testing set, it outperforms the original links score in recovering the (unobserved) original concept. This is because the inductive bias we introduced (monotone solution) was something true for the concept (the mapping of link scores to probabilities) but not a property of the noise model; so the transform prefers signal.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.