mcmcRocPrcGen: ROC and Precision-Recall Curves using Bayesian MCMC estimates...

View source: R/mcmcRocPrcGen.R

mcmcRocPrcGenR Documentation

ROC and Precision-Recall Curves using Bayesian MCMC estimates generalized

Description

This function generates ROC and Precision-Recall curves after fitting a Bayesian logit or probit regression. For fast calculation for from an "rjags" object use mcmcRocPrc

Usage

mcmcRocPrcGen(
  modelmatrix,
  mcmcout,
  modelframe,
  curves = FALSE,
  link = "logit",
  fullsims = FALSE
)

Arguments

modelmatrix

model matrix, including intercept (if the intercept is among the parameters estimated in the model). Create with model.matrix(formula, data). Note: the order of columns in the model matrix must correspond to the order of columns in the matrix of posterior draws in the mcmcout argument. See the mcmcout argument for more and Beger (2016) for background.

mcmcout

posterior distributions of all logit coefficients, in matrix form. This can be created from rstan, MCMCpack, R2jags, etc. and transformed into a matrix using the function as.mcmc() from the coda package for jags class objects, as.matrix() from base R for mcmc, mcmc.list, stanreg, and stanfit class objects, and object$sims.matrix for bugs class objects. Note: the order of columns in this matrix must correspond to the order of columns in the model matrix. One can do this by examining the posterior distribution matrix and sorting the variables in the order of this matrix when creating the model matrix. A useful function for sorting column names containing both characters and numbers as you create the matrix of posterior distributions is mixedsort() from the gtools package.

modelframe

model frame in matrix form. Can be created using as.matrix(model.frame(formula, data))

curves

logical indicator of whether or not to return values to plot the ROC or Precision-Recall curves. If set to FALSE (default), results are returned as a list without the extra values.

link

type of generalized linear model; a character vector set to "logit" (default) or "probit".

fullsims

logical indicator of whether full object (based on all MCMC draws rather than their average) will be returned. Default is FALSE. Note: If TRUE is chosen, the function takes notably longer to execute.

Details

This function generates ROC and precision-recall curves after fitting a Bayesian logit or probit model.

Value

This function returns a list with 4 elements:

  • area_under_roc: area under ROC curve (scalar)

  • area_under_prc: area under precision-recall curve (scalar)

  • prc_dat: data to plot precision-recall curve (data frame)

  • roc_dat: data to plot ROC curve (data frame)

References

Beger, Andreas. 2016. “Precision-Recall Curves.” Available at SSRN: https://ssrn.com/Abstract=2765419. http://dx.doi.org/10.2139/ssrn.2765419.

Examples



if (interactive()) {
# simulating data

set.seed(123456)
b0 <- 0.2 # true value for the intercept
b1 <- 0.5 # true value for first beta
b2 <- 0.7 # true value for second beta
n <- 500 # sample size
X1 <- runif(n, -1, 1)
X2 <- runif(n, -1, 1)
Z <- b0 + b1 * X1 + b2 * X2
pr <- 1 / (1 + exp(-Z)) # inv logit function
Y <- rbinom(n, 1, pr) 
df <- data.frame(cbind(X1, X2, Y))

# formatting the data for jags
datjags <- as.list(df)
datjags$N <- length(datjags$Y)

# creating jags model
model <- function()  {
  
  for(i in 1:N){
    Y[i] ~ dbern(p[i])  ## Bernoulli distribution of y_i
    logit(p[i]) <- mu[i]    ## Logit link function
    mu[i] <- b[1] + 
      b[2] * X1[i] + 
      b[3] * X2[i]
  }
  
  for(j in 1:3){
    b[j] ~ dnorm(0, 0.001) ## Use a coefficient vector for simplicity
  }
  
}

params <- c("b")
inits1 <- list("b" = rep(0, 3))
inits2 <- list("b" = rep(0, 3))
inits <- list(inits1, inits2)

## fitting the model with R2jags
set.seed(123)
fit <- R2jags::jags(data = datjags, inits = inits, 
                    parameters.to.save = params, n.chains = 2, n.iter = 2000, 
                    n.burnin = 1000, model.file = model)

# processing the data
mm <- model.matrix(Y ~ X1 + X2, data = df)
xframe <- as.matrix(model.frame(Y ~ X1 + X2, data = df))
mcmc <- coda::as.mcmc(fit)
mcmc_mat <- as.matrix(mcmc)[, 1:ncol(xframe)]

# using mcmcRocPrcGen
fit_sum <- mcmcRocPrcGen(modelmatrix = mm,
                      modelframe = xframe,
                      mcmcout = mcmc_mat,
                      curves = TRUE,
                      fullsims = FALSE)
}




ShanaScogin/BayesPostEst documentation built on May 20, 2022, 6:36 p.m.