Forwards, backwards variable selection, picking variables to maximize explained variance.

Share:

Description

Attempts to find the best explanatory set of variables to explain a single variable in a data set. Iterates between adding the next best variable to the set and removing the variable (if any) whose exclusion maximizes the overall score.

Usage

1
fbvs(dataSet,one,maxv,linear)

Arguments

dataSet

the n x m data frame representing n observations of m variables.

one

a string specifying the name of one variable in the dataset, for which the best explanatory set is required. Defaults to the name of the last variable in the dataset.

maxv

an integer limiting the maximum number of variables in the explanatory set. Defaults to m-1.

linear

a boolean flag which causes fbvs to use a linear model to estimate R^2 instead of matie to estimate A when running the selection algorithm. Defaults to FALSE

Details

Variable names are only added to the explanatory set if their inclusion results in an increase in the association measure.

Value

Returns a list containing the following items:

one

the name of the one variable that requires the explanatory set

best

the best set of explanatory variables

Rsq

an estimate for R^2 provided by the best set of explanatory variables

Note

The data set can be of any dimension

Author(s)

Ben Murrell, Dan Murrell & Hugh Murrell.

References

Discovering general multidimensional associations, http://arxiv.org/abs/1303.1828

See Also

ma agram

Examples

1
2
3
4
5
6
7
    # measure association for all pairs in a subrange of the baseball dataset
    data(baseballData)
    fbvs(baseballData,one="Salary")
    fbvs(baseballData,one="Salary",linear=TRUE)

    fbvs(baseballData,one="Salary",maxv=2)
    fbvs(baseballData,one="Salary",maxv=2,linear=TRUE)