regsearch: An exhaustive search regression built on base R
In guslipkin/dewey: An R library for a variety of things

View source: R/regsearch.R

regsearch

R Documentation

An exhaustive search regression built on base R

Description

An exhaustive search regression built on base R

Usage

regsearch(
  data,
  dependent,
  independent,
  minvar = 1,
  maxvar,
  family,
  topN = 0,
  interactions = FALSE,
  multi = FALSE,
  ...
)

Arguments

`data`	A 'data.frame' that contains a dependent variable and the independent variables.
`dependent`	The dependent variable for the regression.
`independent`	A vector of independent variables to be used. These must match the column names from 'data'. These can also include interaction terms made from column names from 'data'. This allows for specific interaction terms to be used, rather than every possible interaction as is done with 'interactions = TRUE'.
`minvar`	(Optional) The minimum number of independent variables to be used in the regression. Defaults to 1.
`maxvar`	The maximum number of independent variables to be used in the regression. Must be equal to or less than the number of independent variables. If interaction terms are used, they count as one independent variable.
`family`	The type of regression. Passed to 'glm'. See `glm` for more information.
`topN`	(Optional) The number of top results to be printed upon run completion. Defaults to 0.
`interactions`	(Optional) A boolean indicating whether or not interaction terms should be used. Defaults to 'FALSE'.
`multi`	(Optional) A boolean indicating whether or not multithreading should be used. Defaults to 'FALSE'. It is highly recommended to use multithreading.
`...`	(Optional) Function arguments to be passed to `glm`

Value

Returns a 'data.table' of information on the regressions run. The resulting data.table is sorted in descending order by the rSquare divided by the mean p-value. This is generally reliable in pushing quality regressions to the top of the list.

`'formula'`	The regression formula used.
`'aic'`	The aic for the regression.
`'rSquare'`	The calculated r-square for the regression.
`'warn'`	Currently unused.
`independent`	Each variable column contains the p-values for that variable or interaction term in a given regression.

Examples

# Creating dummy data
dt <- data.frame("dependent" = sample(c(0, 1), 100, replace = TRUE),
"ind_1" = runif(100, 0, 1),
"ind_2" = runif(100, 0, 1),
"ind_3" = runif(100, 0, 1),
"ind_4" = runif(100, 0, 1))

# Without interaction terms and multithreading
## Two top results
regsearch(dt, "dependent", c("ind_1", "ind_2", "ind_3", "ind_4"),
1, 4, "binomial", 2)
## No top results
regsearch(dt, "dependent", c("ind_1", "ind_2", "ind_3", "ind_4"),
1, 4, "binomial", FALSE, FALSE)

# With interaction terms and multithreading
regsearch(dt, "dependent", c("ind_1", "ind_2", "ind_3", "ind_4"),
1, 4, "binomial", TRUE, TRUE)

guslipkin/dewey documentation built on March 16, 2023, 8:19 a.m.