Conditioner

Isotone Coding in vtreat

John Mount, Win-Vector LLC 2018-12-03

Monotone (or isotone) regression via the isotone package (also give scam and gbm var.monotone a look, which should have the advantage of also being low complexity).

We will use the vtreat package custom coder interface, which will supply cross-validated significance calculations and out-of sample interpolation (allowing us to apply the monotone transforms to new data). For a more substantial application of vtreat custom coding please see the partial pooling application.

suppressPackageStartupMessages(library("ggplot2"))
library("seplyr")

## Loading required package: wrapr

source("isotone.R")

# set up example data
set.seed(23525)
d <- data.frame(x = 10*runif(200))
d$yIdeal <- d$x^2
d$yObserved <- d$yIdeal + 25*rnorm(nrow(d))
d$isTrain <- runif(nrow(d))<=0.5

ggplot(data=d, aes(x=x)) + 
  geom_line(aes(y=yIdeal), color='blue', linetype=2) + 
  geom_point(aes(y=yObserved, color=isTrain, shape=isTrain)) +
  ylab('y') +
  ggtitle("ideal and observed responses as functions of x",
          subtitle = "dashed curve: ideal (pre-noise) values")

customCoders = list('n.NonDecreasingV.num' = solveNonDecreasing,
                    'n.NonIncreasingV.num' = solveNonIncreasing)
treatments <- vtreat::designTreatmentsN(d[d$isTrain, , drop=FALSE], 
                                        'x', 'yObserved', 
                                        customCoders = customCoders,
                                        verbose = FALSE)
print(treatments$scoreFrame[, c('varName', 'rsq', 'sig', 'needsSplit'), drop=FALSE])

##            varName       rsq          sig needsSplit
## 1 x_NonDecreasingV 0.5809673 9.905695e-18       TRUE
## 2          x_clean 0.6129088 3.320652e-19      FALSE

dTreated <- vtreat::prepare(treatments, d)
d$soln <- dTreated$x_NonDecreasingV

dTrain <- d[d$isTrain, , drop=FALSE]

# good inference on train
sum((dTrain$yIdeal - dTrain$soln)^2)

## [1] 7460.391

sum((dTrain$yIdeal - dTrain$yObserved)^2)

## [1] 60855.41

dTest <- d[!d$isTrain, , drop=FALSE]

# good performance on test
sum((dTest$yIdeal - dTest$soln)^2)

## [1] 14315.25

sum((dTest$yIdeal - dTest$yObserved)^2)

## [1] 75152.66

ggplot(data=d, aes(x=x)) + 
  geom_line(aes(y=yIdeal), color='blue', linetype=2) + 
  geom_point(aes(y=yObserved, color=isTrain, shape=isTrain)) +
  geom_line(aes(x=x, y=soln), color='darkgreen') +
  ylab('y') +
  ggtitle("ideal and observed responses as functions of x",
          subtitle = "solid path: isotone fit")

The above formulation is kind of exciting. You get one degree of freedom per data-row (a very large number), but a simple constraint system (that the produced predictions must follow the x-order constraints) is enough to produce reasonable fits. This reminiscent of the maximum entropy formulation of logistic regression, and is evidence one is working with a sort of dual-formulation of a smaller primal problem.

Some notes on smoother implementations can be found here.

We can also easily adapt to classification and to categorical inputs.

source("isotone.R")

# set up example data
set.seed(23525)
d <- data.frame(x = 10*runif(200))
d$yIdeal <- -d$x^2
d$yObserved <- d$yIdeal + 25*rnorm(nrow(d))
d$isTrain <- runif(nrow(d))<=0.5
threshold <- -50
d$yIdeal <- d$yIdeal >= threshold
d$yObserved <- d$yObserved >= threshold


# could also build link-space versions
customCoders = list('c.NonDecreasingV.num' = solveNonDecreasing,
                    'c.NonIncreasingV.num' = solveNonIncreasing)
treatments <- vtreat::designTreatmentsC(d[d$isTrain, , drop=FALSE], 
                                        'x', 'yObserved', TRUE,
                                        customCoders = customCoders,
                                        verbose = FALSE)
# examining variables
print(treatments$scoreFrame[, c('varName', 'rsq', 'sig', 'needsSplit'), drop=FALSE])

##            varName       rsq          sig needsSplit
## 1 x_NonIncreasingV 0.3370146 2.856172e-10       TRUE
## 2          x_clean 0.3522417 1.138713e-10      FALSE

# copy fit over to original data frame
dTreated <- vtreat::prepare(treatments, d)
d$soln <- dTreated$x_NonIncreasingV
d$prediction <- d$soln>0.5

dTrain <- d[d$isTrain, , drop=FALSE]

# good inference on train
sigr::wrapChiSqTest(dTrain, 'soln', 'yIdeal')

## [1] "Chi-Square Test summary: pseudo-R2=0.688 (X2(1,N=87)=77.12, p<1e-05)."

table(dTrain$prediction, dTrain$yIdeal)

##        
##         FALSE TRUE
##   FALSE    28    0
##   TRUE      2   57

sigr::wrapFisherTest(dTrain, 'prediction', 'yIdeal')

## [1] "Fisher's Exact Test for Count Data: (odds.ratio=Inf, p<1e-05)."

table(dTrain$yObserved, dTrain$yIdeal)

##        
##         FALSE TRUE
##   FALSE    25   11
##   TRUE      5   46

sigr::wrapFisherTest(dTrain, 'yObserved', 'yIdeal')

## [1] "Fisher's Exact Test for Count Data: (odds.ratio=19.91, p<1e-05)."

dTest <- d[!d$isTrain, , drop=FALSE]

# good performance on test
sigr::wrapChiSqTest(dTest, 'soln', 'yIdeal')

## [1] "Chi-Square Test summary: pseudo-R2=0.7233 (X2(1,N=113)=105.3, p<1e-05)."

table(dTest$prediction, dTest$yIdeal)

##        
##         FALSE TRUE
##   FALSE    38    0
##   TRUE      1   74

sigr::wrapFisherTest(dTest, 'prediction', 'yIdeal')

## [1] "Fisher's Exact Test for Count Data: (odds.ratio=Inf, p<1e-05)."

table(dTest$yObserved, dTest$yIdeal)

##        
##         FALSE TRUE
##   FALSE    24   11
##   TRUE     15   63

sigr::wrapFisherTest(dTest, 'yObserved', 'yIdeal')

## [1] "Fisher's Exact Test for Count Data: (odds.ratio=8.936, p<1e-05)."

One application we have used the monotone methodology with good success is: calibrating regressions and classifiers.

This is an idea that came up in discussion with Jeremy Howard. Jeremy, while President and Chief Scientist of Kaggle, decided AUC was a good "early to see if you have something" metric for running contests. That leaves the question: what is a principled mechanical way to convert a score with a good AUC to a score with correct probabilities (a good deviance). Kaggle participants are famous for good solutions. But here is our solution: a arbitrary isotone transform.

That is: we take a model that does well on the AUC measure (meaning it is good at ranking or reproducing order relations) and build the best model with the same order structure with respect to a more stringent measure (such as sum of squared errors, or deviance). Often this step is ignored or done by binning or some other method- but for systems that are not natively in probability units (such as margin based systems such as support vector machines) this isotone calibration or polish step can be an improvement (assuming one is careful about nested model bias issues).

We can try- that. Suppose we forgot to set type="response" on a logistic regression and we didn't know the link function is the sigmoid (so we can't directly apply the correction).

source("isotone.R")

# set up example data
set.seed(23525)
d <- data.frame(x = 10*runif(200))
d$yIdeal <- -d$x^2
d$yObserved <- d$yIdeal + 25*rnorm(nrow(d))
d$isTrain <- runif(nrow(d))<=0.5
threshold <- -50
d$yIdeal <- d$yIdeal >= threshold
d$yObserved <- d$yObserved >= threshold



model <- glm(yObserved ~ x, data = d[d$isTrain, , drop=FALSE], family = binomial)
sigr::wrapChiSqTest(model)

## [1] "Chi-Square Test summary: pseudo-R2=0.3522 (X2(1,N=87)=41.57, p<1e-05)."

d$rawScore <- predict(model, newdata = d)  # oops forgot type='response'
head(d)

##          x yIdeal yObserved isTrain   rawScore
## 1 3.615347   TRUE      TRUE   FALSE  2.0180394
## 2 6.811326   TRUE      TRUE    TRUE -0.1834170
## 3 7.173432  FALSE      TRUE   FALSE -0.4328434
## 4 9.732597  FALSE     FALSE    TRUE -2.1956494
## 5 3.726201   TRUE      TRUE   FALSE  1.9416808
## 6 2.197222   TRUE      TRUE   FALSE  2.9948735

# fix it with isotone regression
customCoders = list('c.NonDecreasingV.num' = solveNonDecreasing)
treatments <- vtreat::designTreatmentsC(d[d$isTrain, , drop=FALSE], 
                                        'rawScore', 'yObserved', TRUE,
                                        customCoders = customCoders,
                                        verbose = FALSE)
print(treatments$scoreFrame[, c('varName', 'rsq', 'sig', 'needsSplit'), drop=FALSE])

##                   varName       rsq          sig needsSplit
## 1 rawScore_NonDecreasingV 0.3370146 2.856172e-10       TRUE
## 2          rawScore_clean 0.3522417 1.138713e-10      FALSE

predictionCollar <- 2/(sum(d$isTrain)-1)

Notice in the score frame vtreat's cross validation based scoring correctly indentifies that the isotone encoding is over-fitting and not in fact better than the sigmoid link function.

# copy fit over to original data frame
dTreated <- vtreat::prepare(treatments, d)
d$adjScore <- dTreated$rawScore_NonDecreasingV
d$adjScore <- pmax(predictionCollar, pmin(1-predictionCollar, d$adjScore))
d$adjPred <- d$adjScore>=0.5

# and the correct link
d$linkScore <-  predict(model, newdata = d, type = 'response')
d$linkPred <- d$linkScore>=0.5
head(d)

##          x yIdeal yObserved isTrain   rawScore   adjScore adjPred
## 1 3.615347   TRUE      TRUE   FALSE  2.0180394 0.91666667    TRUE
## 2 6.811326   TRUE      TRUE    TRUE -0.1834170 0.61111111    TRUE
## 3 7.173432  FALSE      TRUE   FALSE -0.4328434 0.28571429   FALSE
## 4 9.732597  FALSE     FALSE    TRUE -2.1956494 0.02325581   FALSE
## 5 3.726201   TRUE      TRUE   FALSE  1.9416808 0.91666667    TRUE
## 6 2.197222   TRUE      TRUE   FALSE  2.9948735 0.91666667    TRUE
##   linkScore linkPred
## 1 0.8826781     TRUE
## 2 0.4542739    FALSE
## 3 0.3934475    FALSE
## 4 0.1001419    FALSE
## 5 0.8745367     TRUE
## 6 0.9523420     TRUE

dTest <- d[!d$isTrain, , drop=FALSE]

sigr::wrapChiSqTest(dTest, 'adjScore', 'yObserved')

## [1] "Chi-Square Test summary: pseudo-R2=0.1356 (X2(1,N=113)=18.96, p=1.335e-05)."

dTest %.>% 
  group_by_se(., "yObserved") %.>%
  summarize_nse(., minAdj := min(adjScore), maxAdj := max(adjScore))

## # A tibble: 2 x 3
##   yObserved minAdj maxAdj
##   <lgl>      <dbl>  <dbl>
## 1 FALSE     0.0233  0.977
## 2 TRUE      0.0233  0.977

WVPlots::DoubleDensityPlot(dTest, 'adjScore', 'yObserved',
                           "adjusted prediction against observations")

sigr::wrapChiSqTest(dTest, 'linkScore', 'yObserved')

## [1] "Chi-Square Test summary: pseudo-R2=0.2027 (X2(1,N=113)=28.36, p<1e-05)."

dTest %.>% 
  group_by_se(., "yObserved") %.>%
  summarize_nse(., minAdj := min(linkScore), maxAdj := max(linkScore))

## # A tibble: 2 x 3
##   yObserved minAdj maxAdj
##   <lgl>      <dbl>  <dbl>
## 1 FALSE     0.0860  0.989
## 2 TRUE      0.132   0.989

WVPlots::DoubleDensityPlot(dTest, 'linkScore', 'yObserved',
                           "link prediction against observations")

sigr::wrapChiSqTest(dTest, 'adjScore', 'yIdeal')

## [1] "Chi-Square Test summary: pseudo-R2=0.7135 (X2(1,N=113)=103.9, p<1e-05)."

dTest %.>% 
  group_by_se(., "yIdeal") %.>%
  summarize_nse(., minAdj := min(adjScore), maxAdj := max(adjScore))

## # A tibble: 2 x 3
##   yIdeal minAdj maxAdj
##   <lgl>   <dbl>  <dbl>
## 1 FALSE  0.0233  0.521
## 2 TRUE   0.611   0.977

WVPlots::DoubleDensityPlot(dTest, 'adjScore', 'yIdeal',
                           "adjusted prediction against ideal (unobserved) concept")

sigr::wrapChiSqTest(dTest, 'linkScore', 'yIdeal')

## [1] "Chi-Square Test summary: pseudo-R2=0.6388 (X2(1,N=113)=93.02, p<1e-05)."

dTest %.>% 
  group_by_se(., "yIdeal") %.>%
  summarize_nse(., minAdj := min(linkScore), maxAdj := max(linkScore))

## # A tibble: 2 x 3
##   yIdeal minAdj maxAdj
##   <lgl>   <dbl>  <dbl>
## 1 FALSE  0.0860  0.401
## 2 TRUE   0.418   0.989

WVPlots::DoubleDensityPlot(dTest, 'linkScore', 'yIdeal',
                           "link prediction against ideal (unobserved) concept")

And we see the adjusted prediction is pretty good, even with the nested model bias issue. In fact even though it shows sings of over-fit on the testing set, it outperforms the original links score in recovering the (unobserved) original concept. This is because the inductive bias we introduced (monotone solution) was something true for the concept (the mapping of link scores to probabilities) but not a property of the noise model; so the transform prefers signal.

WinVector/vtreat documentation built on Aug. 29, 2023, 4:49 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

Tweet to @rdrrHQ

GitHub issue tracker

ian@mutexlabs.com