regression: performs the specified regression on the data
In Benjamin-Vincent-Lab/binfotron: Binfotron Bioinformatics Analysis Tools Suite

regression

R Documentation

performs the specified regression on the data

Description

The purpose of regression is to perform a regression on the data across the range of independant and dependant variables provided. If m

Usage

regression(
  input_dt,
  indep_list,
  base_file_name = "regression_output",
  clear_readme = TRUE,
  combined_group_name = "All",
  dep_vars = NULL,
  dep_var_families = NULL,
  event_clm = "OS_e",
  fdr_method = NULL,
  inclusion_list = list(),
  model_comparison_list = NULL,
  model_function = function(dep_var = "", indep_vars = "NULL") {
     paste0("glm(",
    dep_var, " ~ ", paste0("`", indep_vars, "`", collapse = " + "),
    ", data = model_dt)")
 },
  my_grouping = NULL,
  output_dir = ".",
  sample_clm = get_default_sample_key(),
  save_models = FALSE,
  time_clm = "OS_d",
  write_files = TRUE,
  include_dep_var_in_prediction_name = FALSE
)

Arguments

`input_dt`	A data.table that includes all columns of data needed to do the analysis: `names(indep_list)`, dep_vars (for glm), `names(inclusion_list)`, event_clm & time_clm (for coxph), `names(unique(unlist(model_comparison_list))`, and my_grouping.
`indep_list`	Required named list of column names to use as the independent variable. Names of the list will be used to name the output stats. Example: `my_indep_list = list( # no default \cr TRA_Chao1 = c("TRA_Chao1"), TRB_Chao1 = c("TRB_Chao1"), TCR_Chao1 = c("TRA_Chao1", "TRB_Chao1") )`
`base_file_name`	Character string to prefix the names of the output files.
`combined_group_name`	Character string to call the combinded groups catagory.
`dep_vars`	Character vector containing the coumn names. Example: `codemy_dep_vars = c("Age", "SNV_Log2_Neoantigens", "Indel_Log2_Neoantigens")`
`dep_var_families`	This character vector should contain the names of the families to add to the model. This isn't used for coxph, since the dependent variable for coxph is survival. Possible values here should be of the form: `'Gamma("identity")'` or `'gaussian'` and can include anything accepted `glm`.
`event_clm`	For coxph. The name of the column from which to draw the event information. The column should only contain integers of 1 and 0. If specified, this column needs to be present in `input_dt`.
`fdr_method`	Deprecated. Multiple PValue columns made this overly complicated. Just use `binfotron::calc_fdr` separately. `stats::p.adjust.methods`.
`inclusion_list`	List to specify the samples that should be kept. For example `list(pathology_T_stage = c('T1', 'T2', 'T3'), is_asian = c(TRUE))` would drop samples in which the value for the column named 'pathology_T_stage' was not either 'T1', 'T2', or 'T3'. Samples must also have 'is_asian' equal to `TRUE`. The names of this list should be column names for `input_dt`
`model_comparison_list`	Optional named list of column names that should be used for a full and reduced model comparison. Every group of coulms on this list will be run against each dep_vars indep_list combination. The reduced model will only include the items on the list. The full modle will include the independent varible(s) as well. The names of this list will be what the model comparison will be called. The values of this list should be column names in `input_dt`. Example: `model_comparison_list = list( Age = c("Age"), Tissue = c("Tissue"), Combined = c("Age", "Tissue") )`
`model_function`	A function to return the model. Important to set `data = model_dt` in the function. Do not set glm family. This will be added based on `dep_var_families`. See examples below. glm example: `model_function =function(dep_var = "", indep_vars = "NULL"){ paste0("glm(", dep_var, " ~ ", paste0(indep_vars, collapse = " + "), ", data = model_dt)") }` coxph example: `model_function = function(dep_var = "", indep_vars = "NULL"){ paste0("coxph(Surv(", time_clm,", ", event_clm, ") ~ ", paste0(indep_vars, collapse = " + "), ", data = model_dt)") }`
`my_grouping`	This string is the name of the column you want to use to split the data into groups. If specified, this column needs to present in `input_dt`.
`output_dir`	Path to the output directory. The parent directory to the path must exist.
`sample_clm`	String to indicate the name of the column for sample names. Only used to output predictions.
`save_models`	Boolean on whether you would like to save the models in individual rds files named <base_file_name>_<group_name>_<indep_list_name> .
`time_clm`	For coxph. The name of the column from which to draw the time information. If specified, this column needs to present in `input_dt`.
`write_files`	Boolean on whether you would like to write the output files.
`fdr_by_columns`	Deprecated. Multiple PValue columns made this overly complicated. Just use `binfotron::calc_fdr` separately.
`fdr_by_columns_for_model_comp`	Deprecated. Multiple PValue columns made this overly complicated. Just use `binfotron::calc_fdr` separately.

Details

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ regression ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

This function utilizes one of either glm or coxph methods.

Value

List containing several outputs:

stats - data.table with the results of the model output
model_comp - data.table with the full vs reduced model comparisons if model_comparison_list is provided
readme - An output of the comparisons made.