compute_vibrations: Vibrations

Description Usage Arguments

Description

Run vibrations for all features for all datasets

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
compute_vibrations(
  bound_data,
  primary_variable,
  constant_adjusters = NULL,
  model_type = "glm",
  features_of_interest,
  max_vibration_num = 10000,
  proportion_cutoff = 1,
  cores = 1,
  max_vars_in_model = 20,
  family = gaussian(),
  ids = NULL,
  strata = NULL,
  weights = NULL,
  nest = NULL
)

Arguments

bound_data

Dataframe of tibbles. All independent and depenendent dataframes for all datasets.

primary_variable

The column name from the independent_variables tibble containing the key variable you want to associate with disease in your first round of modeling (prior to vibration). For example, if you are interested fundamentally identifying how well age can predict height, you would make this value a string referring to whatever column in said dataframe refers to "age."

constant_adjusters

A character vector (or just one string) of column names corresponding to column names in your dataset to include in every vibration. (default = NULL)

model_type

Specifies regression type – "glm", "survey", or "negative_binomial". Survey regression will require additional parameters (at least weight, nest, strata, and ids). Any model family (e.g. gaussian()), or any other parameter can be passed as the family argument to this function.

features_of_interest

Feature to vibrate over.

max_vibration_num

Maximum number of vibrations allowed for a single dependent variable. Setting this will also reduce runtime by reducing the number of models fit. (default = 10,000)

proportion_cutoff

Float between 0 and 1. Filter out dependent features that are this proportion of zeros or more (default = 1, so no filtering will be done.)

cores

Number of threads.

max_vars_in_model

Maximum number of variables allowed in a single fit in vibrations. In case an individual has many hundreds of metadata features, this prevents models from being fit with excessive numbers of variables. Modifying this parameter will change runtime for large datasets. For example, just computing all possible models for 100 variables is extremely slow. (default = 20)

family

GLM family (default = gaussian()). For help see help(glm) or help(family).

ids

Name of column in dataframe specifying cluster ids from largest level to smallest level. Only relevant for survey data. (Default = NULL).]

strata

Name of column in dataframe with strata. Only relevant for survey data. (Default = NULL).

weights

Name of column containing sampling weights.

nest

If TRUE, relabel cluster ids to enforce nesting within strata


chiragjp/quantvoe documentation built on Oct. 11, 2021, 1:46 a.m.