regsearch: An exhaustive search regression built on base R

View source: R/regsearch.R

regsearchR Documentation

An exhaustive search regression built on base R

Description

An exhaustive search regression built on base R

Usage

regsearch(
  data,
  dependent,
  independent,
  minvar = 1,
  maxvar,
  family,
  topN = 0,
  interactions = FALSE,
  multi = FALSE,
  ...
)

Arguments

data

A 'data.frame' that contains a dependent variable and the independent variables.

dependent

The dependent variable for the regression.

independent

A vector of independent variables to be used. These must match the column names from 'data'. These can also include interaction terms made from column names from 'data'. This allows for specific interaction terms to be used, rather than every possible interaction as is done with 'interactions = TRUE'.

minvar

(Optional) The minimum number of independent variables to be used in the regression. Defaults to 1.

maxvar

The maximum number of independent variables to be used in the regression. Must be equal to or less than the number of independent variables. If interaction terms are used, they count as one independent variable.

family

The type of regression. Passed to 'glm'. See glm for more information.

topN

(Optional) The number of top results to be printed upon run completion. Defaults to 0.

interactions

(Optional) A boolean indicating whether or not interaction terms should be used. Defaults to 'FALSE'.

multi

(Optional) A boolean indicating whether or not multithreading should be used. Defaults to 'FALSE'. It is highly recommended to use multithreading.

...

(Optional) Function arguments to be passed to glm

Value

Returns a 'data.table' of information on the regressions run. The resulting data.table is sorted in descending order by the rSquare divided by the mean p-value. This is generally reliable in pushing quality regressions to the top of the list.

'formula'

The regression formula used.

'aic'

The aic for the regression.

'rSquare'

The calculated r-square for the regression.

'warn'

Currently unused.

independent

Each variable column contains the p-values for that variable or interaction term in a given regression.

Examples

# Creating dummy data
dt <- data.frame("dependent" = sample(c(0, 1), 100, replace = TRUE),
"ind_1" = runif(100, 0, 1),
"ind_2" = runif(100, 0, 1),
"ind_3" = runif(100, 0, 1),
"ind_4" = runif(100, 0, 1))

# Without interaction terms and multithreading
## Two top results
regsearch(dt, "dependent", c("ind_1", "ind_2", "ind_3", "ind_4"),
1, 4, "binomial", 2)
## No top results
regsearch(dt, "dependent", c("ind_1", "ind_2", "ind_3", "ind_4"),
1, 4, "binomial", FALSE, FALSE)

# With interaction terms and multithreading
regsearch(dt, "dependent", c("ind_1", "ind_2", "ind_3", "ind_4"),
1, 4, "binomial", TRUE, TRUE)

guslipkin/dewey documentation built on March 16, 2023, 8:19 a.m.