parse_dCVnet_input: parse_dCVnet_input

View source: R/dCVnet_utilities.R

parse_dCVnet_inputR Documentation

parse_dCVnet_input

Description

Collate an outcome (y) predictor matrix (x) into a standardised object ready for dCVnet functions. Optionally x can be a dataframe and a one-sided formula (f) can be provided to allow interactions, transformations and expansions using R formula notation.

Usage

parse_dCVnet_input(
  data,
  y,
  family,
  f = "~.",
  offset = NULL,
  yname = "y",
  passNA = FALSE
)

Arguments

data

a data.frame containing variables needed for the formula (f).

y

the outcome (can be numeric vector, a factor (for binomial / multinomial) or a matrix for cox/mgaussian) For factors see Factor Outcomes section below.

family

the model family (see glmnet)

f

a one sided formula. The RHS must refer to columns in data and may include interactions, transformations or expansions (like poly, or log). The formula must include an intercept.

offset

optional model offset (see glmnet)

yname

an optional label for the outcome / y variable.

passNA

should NA values in data be excluded (FALSE) or passed through (TRUE)?

Value

a list containing

  • y - outcome

  • x_mat - predictor matrix including expansions, interaction terms specified in f

  • yname - a variable name for the y-variable

  • family - the model family

Factor Outcomes

For categorical families (binomial, multinomial) input can be:

  • numeric (integer): c(0,1,2)

  • factor: factor(1:3, labels = c("A", "B", "C")))

  • character: c("A", "B", "C")

  • other

These are treated differently.

Numeric data is used as provided. Character data will be coerced to a factor: factor(x, levels = sort(unique(x))). Factor data will be used as provided, but must have levels in alphabetical order.

In all cases the reference category must be ordered first, this means for the binomial family the 'positive' category is second.

Why alphabetical? Previously bugs arose due to different handling of factor levels between functions called by dCVnet. These appear to be resolved in the latest versions of the packages, but this restriction will stay until I can verify.

Notes

Sparse matrices are not supported by dCVnet.


AndrewLawrence/dCVnet documentation built on Sept. 24, 2024, 5:24 a.m.