ParScreenVars: Screen a data set for important functions in parallel

View source: R/ScreeningMethods.R

ParScreenVarsR Documentation

Screen a data set for important functions in parallel

Description

Screen a data set for important functions in parallel

Usage

ParScreenVars(
  datasetX,
  datasetY,
  method = "SIS",
  ncores = 1,
  cutoff = NULL,
  train1ids = NULL,
  trainsize1 = NULL,
  trainsize2 = NULL,
  train2ids = NULL,
  seed = NULL,
  Ginv = NULL,
  c = NULL,
  leveltot = NULL,
  PTscale = TRUE
)

Arguments

datasetX

A matrix containing values that are predictors for the Y values

datasetY

A vector containg the class that each predictor corresponds to. For now can only handle binary responses.

method

A string containing the type of screening to do. Can be "SIS", "KS", "CVBF" or "PT"

ncores

A integer that corresponds to the number of cores to be used for parallelizing computation

cutoff

A real number that corresponds either to an alpha value for testing or a cutoff value on how large the Bayes factor needs to be to conclude a difference exists.

train1ids

A vector of ids that correspond to which observations to use for the training set for the first data set

trainsize1

Size of the training set for one of the classes for CVBF

trainsize2

Size of the training set for the other one of the classes for CVBF

train2ids

A vector of ids that correspond to which observations to use for the training set for the second data set

seed

A seed for CVBF based screening, can use this to reproduce results instead of train_ids.

Ginv

A function to compute quantiles with for Polya tree.

c

Tuning parameter for Polya tree, signifies how impactful prior should be.

leveltot

Depth of Polya tree to construct if Polya tree based screening is type of screening chosen

PTscale

A True / false variable. Should columns be standardized before proceeding with Polya tree based screening? Default is to screen as recommended by authors.

Value

A list of variables that are interpreted to be important

Examples

data(gisettetrainlabs)
data(gisettetrainpreds)
nworkers = detectCores()
ImpVarsSIS1 = ParScreenVars(datasetX = gisettetrainpreds[, 1:500], datasetY = gisettetrainlabs[,1], method = "SIS", ncores = nworkers / 2)
length(ImpVarsSIS1$varspicked)
ImpVarsKS1 = ParScreenVars(datasetX = gisettetrainpreds[, 1:500], datasetY = gisettetrainlabs[,1], method = "KS", ncores = nworkers / 2)
length(ImpVarsKS1$varspicked)
ImpVarsPT1 = ParScreenVars(datasetX = gisettetrainpreds[, 1:500], datasetY = gisettetrainlabs[,1], method = "PT", ncores = nworkers / 2, c = 1, leveltot = 12, Ginv = qnorm, PTscale = TRUE)
#Only do on first 500
length(ImpVarsPT1$varspicked)
hist(ImpVarsPT1$logBFlist)
ImpVarsCVBF1 = ParScreenVars(datasetX = gisettetrainpreds[, 1:500], datasetY = gisettetrainlabs[,1], method = "CVBF", ncores = nworkers / 2, trainsize1 = 2960, trainsize2 = 2960, seed = 200)
length(ImpVarsCVBF1$varspicked)


naveedmerchant/BayesScreening documentation built on June 13, 2024, 7:56 a.m.