bc_simulate: Simulate Data for Binary Classification

Description Usage Arguments Details Value Examples

Description

Simulates a matrix of independent variables and a binary dependent variable that is predicted by a subset of the independent variables.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
bc_simulate(n, K, K_sig, correlated = F, intercept = -0.6,
  param = c(0.5, 1.5), levels = c("Yes", "No"), outliers = NULL,
  rng_seed = list(feature = NULL, coef = NULL, cor = NULL))

is.bc_simulate(x)

## S3 method for class 'bc_simulate'
subset(x, y = T)

## S3 method for class 'bc_simulate'
levels(x)

## S3 method for class 'bc_simulate'
dimnames(x)

## S3 method for class 'bc_simulate'
coef(x, int = FALSE)

## S3 method for class 'bc_simulate'
as.integer(x)

## S3 method for class 'bc_simulate'
print(x)

## S3 method for class 'bc_simulate'
features(x)

## S3 method for class 'bc_simulate'
size(x, y = T)

Arguments

n

The number of subjects to simulate.

K

The number of predictors to include.

K_sig

The number of non-zero coefficients for predictors.

correlated

Logical; if TRUE, a correlation matrix is randomly created so that predictors have non-zero correlations between each other. Optionally, a user-defined correlation matrix can instead be submitted.

intercept

The value of the model intercept in terms of the log-oods.

param

A vector with the lower and upper bounds bewteen which non-zero predictor coefficients should fall.

levels

The terms to use for the factor describing the binary dependent variable.

outliers

An optional vector indicating the proportion of outliers that should occur, and the lower and upper boundaries between which each outlier should fall.

rng_seed

An optional list with seeds controlling the RNG state for 1) selecting which predictors will have non-zero coefficients, 2) the sign, followed by the values of the coefficients, and 3) the correlation matrix.

Details

The method subset can be used to extract the simulated dependent variable (y = TRUE), or the simulated matrix of predictors. The variables can also be extracted directly from the list of outputs (see examples). The method coef extracts the parameters used in the logistic regression to simulate data. The method as.integer can be used to convert the dependent variable into binary values. The method features extracts the labels for the non-zero predictors.

Value

An R object of class 'bc_simulate'.

Examples

1
2
3
4
5
sim = bc_simulate( 100, 8, 4 )
# Extract the dependent variable
y = subset( sim ); y = sim$y
# Extract the predictors
X = subset( sim, y = F ); X = sim$X

rettopnivek/binclass documentation built on May 13, 2019, 4:46 p.m.