View source: R/compute_vibrations.R
Run vibrations for all features in a dataset
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | dataset_vibration(
subframe,
primary_variable,
constant_adjusters,
model_type,
features_of_interest,
max_vibration_num,
proportion_cutoff,
cores,
max_vars_in_model,
family,
ids,
strata,
weights,
nest
)
|
subframe |
List of length 2. Dataframes containing a single datasets independent and dependent data. |
primary_variable |
The column name from the independent_variables tibble containing the key variable you want to associate with disease in your first round of modeling (prior to vibration). For example, if you are interested fundamentally identifying how well age can predict height, you would make this value a string referring to whatever column in said dataframe refers to "age." |
constant_adjusters |
A character vector (or just one string) of column names corresponding to column names in your dataset to include in every vibration. (default = NULL) |
model_type |
Specifies regression type – "glm", "survey", or "negative_binomial". Survey regression will require additional parameters (at leaset weight, nest, strata, and ids). Any model family (e.g. gaussian()), or any other parameter can be passed as an additional argument to this function. |
features_of_interest |
Feature to vibrate over. |
max_vibration_num |
Maximum number of vibrations (default=50000). |
proportion_cutoff |
Float between 0 and 1. Filter out dependent features that are this proportion of zeros or more (default = 1, so no filtering done). |
cores |
Number of threads. |
max_vars_in_model |
Maximum number of variables allowed in a single fit in vibrations. In case an individual has many hundreds of metadata features, this prevents models from being fit with excessive numbers of variables. Modifying this parameter will change runtime for large datasets. For example, just computing all possible models for 100 variables is extremely slow. (default = 20) |
family |
GLM family (default = gaussian()). For help see help(glm) or help(family). |
ids |
Name of column in dataframe pecifying cluster ids from largest level to smallest level. Only relevant for survey data. (Default = NULL).] |
strata |
Name of column in dataframe with strata. Only relevant for survey data. (Default = NULL).] |
weights |
Name of column containing sampling weights. |
nest |
If TRUE, relabel cluster ids to enforce nesting within strata. |
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.