sisal: Sequential Input Selection Algorithm (SISAL)

Description Usage Arguments Details Value Author(s) References See Also Examples

View source: R/sisal.R

Description

Identifies relevant inputs using a backward selection type algorithm with optional branching. Choices are made by assessing linear models estimated with ordinary least squares or ridge regression in a cross-validation setting.

Usage

1
2
3
4
5
6
7
sisal(X, y, Mtimes = 100, kfold = 10, hbranches = 1,
      max.width = hbranches^2, q = 0.165, standardize = TRUE,
      pruning.criterion = c("round robin", "random nodes",
                            "random edges", "greedy"),
      pruning.keep.best = TRUE, pruning.reverse = FALSE,
      verbose = 1, use.ridge = FALSE,
      max.warn = getOption("nwarnings"), sp = -1, ...)

Arguments

X

a numeric matrix where each column is a predictor (independent variable) and each row is an observation (data point)

y

a numeric vector containing a sample of the response (dependent) variable, in the same order as the rows of X

Mtimes

the number of times the cross-validation is repeated, i.e. the number of predictions made for each data point. An integral value (numeric or integer).

kfold

the number of approximately equally sized parts used for partitioning the data on each cross-validation round. An integral value (numeric or integer).

hbranches

the number of branches to take when removing a variable from the model. In Tikka and Hollmén (2008), the algorithm always removes the “weakest” variable (hbranches equals 1, also the default here). By using a value larger than 1, the algorithm creates branches in the search graph by removing each of the hbranches “weakest” variables, one at a time. The number of branches created is naturally limited by the number of variables remaining in the model at that point. See also max.width.

max.width

the maximum number of nodes with a given number of variables allowed in the search graph. The same limit is used for all search levels. An integral value (numeric or integer). See pruning.criterion and pruning.keep.best.

q

a numeric value between 0 and 0.5 (endpoints excluded) defining the quantiles 1-q and q. The difference of these sample quantiles is used as the width of the sampling distribution (a measure of uncertainty) of each coefficient in a linear model. The default value 0.165 is the same as used by Tikka and Hollmén (2008). In case of a normally distributed parameter, the width is approximately twice the standard deviation (one standard deviation on both sides of the mean).

standardize

a logical flag. If TRUE, standardizes the data to zero mean and unit variance. If FALSE, uses original data. This affects the scale of the results. If use.ridge is TRUE, this should be set to TRUE or the search graph and the sets of selected variables could be affected.

pruning.criterion

a character string. Options are "round robin", "random nodes", "random edges" and "greedy". Abbreviations are allowed. This affects how the search tree is pruned if the number of nodes to explore is about to exceed max.width. One of the following methods is used to select max.width nodes for the next level of search.

If "round robin", the nodes of the current level (i variables) take turns selecting nodes for the next level (i-1 variables). The turns are taken in order of increasing validation error. Each parent node chooses children according to the order described in ‘Details’. If a duplicate choice would be made, the turn is skipped.

If "random nodes", random nodes are selected with uniform probability.

If "random edges", random nodes are selected, with the probability of a node directly proportional to the number of edges leading to it.

If "greedy", a method similar to "round robin" is used, but with the (virtual) looping order of parents and children swapped. Whereas the outer loop in "round robin" operates over children and the inner loop over parents, the outer loop in "greedy" operates over parents and the inner loop over children. That is, a "greedy" parent node selects all its children before passing on the turn to the next parent.

pruning.keep.best

a logical flag. If TRUE, the nodes that would also be present in the hbranches = 1 case are immune to pruning. If FALSE, the result may underperform the original Tikka and Hollmén (2008) solution in terms of (the lowest) validation error as function of the number of inputs.

pruning.reverse

a logical flag. If TRUE, all the methods described in pruning.criterion except "random nodes" use reverse orders or inverse probabilities. The default is FALSE.

verbose

a numeric or integer verbosity level from 0 (no output) to 5 (all possible diagnostics).

use.ridge

a logical flag. If TRUE, the function uses ridge regression with automatic selection of the regularization (smoothing) parameter.

max.warn

a numeric value giving the maximum number of warnings to store in the returned object. If more warnings are given, their total number is still recorded in the object.

sp

a numeric value passed to magic if use.ridge is TRUE. Initial value of the regularization parameter. If negative (the default), initialization is automatic.

...

additional arguments passed to magic if use.ridge is TRUE. It is an error to supply arguments named "S" or "off".

Details

When choosing which variable to drop from the model, the importance of a variable is measured by looking at two variables derived from the sampling distribution of its coefficient in the linear models of the repeated cross-validation runs:

  1. absolute value of the median and

  2. width of the distribution (see q).

The importance of an input variable is the ratio of the median to the width: hbranches variables with the smallest ratios are dropped, one variable in each branch. See max.width and pruning.criterion.

The main results of the function are described here. More details are available in ‘Value’.

The function returns two sets of inputs variables:

L.v

set corresponding to the smallest validation error.

L.f

smallest set where validation error is close to the smallest error. The margin is the standard deviation of the training error measured in the node of the smallest validation error.

The mean of mean squared errors in the training and validation sets are also returned (E.tr, E.v). For the training set, the standard deviation of MSEs (s.tr) is also returned. The length of these vectors is the number of variables in X. The i:th element in each of the vectors corresponds to the best model with i input variables, where goodness is measured by the mean MSE in the validation set.

Linear models fitted to the whole data set are also returned. Both ordinary least square regression (lm.L.f, lm.L.v, lm.full) and ridge regression models (magic.L.f, magic.L.v, magic.full) are computed, irrespective of the use.ridge setting. Both fitting methods are used for the L.f set of variables, the L.v set and the full set (all variables).

Value

A list with class "sisal". The items are:

L.f

a numeric vector containing indices to columns of X. See ‘Details’.

L.v

a numeric index vector like L.f. See ‘Details’.

E.tr

a numeric vector of length d + 1. See ‘Details’.

s.tr

a numeric vector of length d + 1. See ‘Details’.

E.v

a numeric vector of length d + 1. See ‘Details’.

L.f.nobranch

a numeric vector or NULL. Like L.f but for the “no branching” solution. NULL if branching is not used or if some elements of branching.useful are missing.

L.v.nobranch

like L.f.nobranch but related to L.v.

E.tr.nobranch

a numeric vector or NULL. Like E.tr but for the “no branching” solution. NULL when branching.useful is NULL. An element is missing when the corresponding element of branching.useful is missing.

s.tr.nobranch

like E.tr.nobranch but related to s.tr.

E.v.nobranch

like E.tr.nobranch but related to E.v.

n.evaluated

a numeric vector of length d + 1. The number of nodes evaluated for each model size, indexed by the number of variables used plus one.

edges

a list of directed edges between nodes in the search graph. There is an edge from node A to node B if and only if B was a candidate for a new node to be evaluated, resulting from removing one variable in A. The i:th element of the list contains edges directed away from the node represented by the i:th element of vertices. Each element is a list with one element, "edges", which is a numeric vector of indices to vertices, pointing to the nodes towards which the edges are directed. There are no edges directed away from pruned nodes or nodes representing a single variable.

vertices

a character vector the same size as edges. Contains the names of the nodes in the search graph. Each name contains the indices of the variables included in the set in question, separated by dots.

vertices.logical

a logical matrix containing an alternative representation of vertices. Number of rows is the length of vertices and number of columns is d. The i:th column indicates whether the i:th input variable is present in a given node. The row index and the index to vertices are equivalent.

vertex.data

A data.frame with information about each node in the search graph (missing information means pruned node). The rows correspond to items in vertices. The columns are:

E.tr

mean of MSEs, training.

s.tr

standard deviation (n-1) of MSEs, training.

E.v

mean of MSEs, validation.

E.v.level.rank

rank of the node among all the evaluated (non-pruned) nodes with the same number of variables, in terms of validation error. Smallest error is rank 1.

n.rank.deficient

number of rank deficient linear models. This problem arises when the number of input variables is large compared to the number of observations and use.ridge is FALSE.

n.NA.models

number of models that could not be estimated due to lack of any samples

n.inputs

number of input variables used in the model represented by the node.

min.branches

the smallest branching factor large enough for producing the node. This is a number k between 1 and hbranches. The value for the root node (all input variables) is 1. The value for other nodes is the minimum of the set of values suggested by its parents. The value suggested by an individual parent is the min.branches value of the parent itself or the ranking of the child in terms of increasing importance of the removed variable (see ‘Details’), whichever is larger. For example, when pruning.keep.best is TRUE, the hbranches = 1 search path can be followed by looking for nodes where min.branches is 1.

var.names

names of the variables (column names of X).

n

number of observations in the (X, y) data.

d

number of variables (columns) in X.

n.missing

number of samples where either y or all variables of X are missing.

n.clean

number of complete samples in the data set X, y.

lm.L.f

lm model fitted to L.f variables.

lm.L.v

lm model fitted to L.v variables.

lm.full

lm model fitted to all variables.

magic.L.f

magic model fitted to L.f variables.

magic.L.v

magic model fitted to L.v variables.

magic.full

magic model fitted to all variables.

mean.y

mean of y.

sd.y

standard deviation (denominator n - 1) of y.

zeroRange.y

a logical value indicating whether all non-missing elements of y are equal, with some numeric tolerance.

mean.X

column means of X.

sd.X

standard deviation (denominator n - 1) of each column in X.

zeroRange.X

a logical vector. Like zeroRange.y but for each column of X.

constant.X

a logical vector where the i:th value indicates whether the i:th column of X has a (nearly) constant, non-zero value (NA values allowed).

params

a named list containing the values used for most of the parameter-like formal arguments of the function, and also anything in .... The names are the names of the parameters.

pairwise.points

a numeric square matrix with d rows and columns. The count in row i, column j indicates the number of times that variable i was better than variable j. See ‘Details’ in plotSelected.sisal.

pairwise.wins

a logical square matrix with d rows and columns. A TRUE value in row i, column j indicates that i is more important than variable j. Derived from pairwise.points.

pairwise.preferences

a numeric vector with d elements. Number of wins minus number of losses (when another variable wins) per variable. Derived from pairwise.wins.

pairwise.rank

an integer vector of ranks according to Copeland's pairwise aggregation method. Element number i is the rank of variable (column) number i in X. Derived from pairwise.preferences. See ‘Details’ in plotSelected.sisal.

path.length

a numeric vector of path lengths. Consider a path starting from the full model and continuing through incrementally smaller models, each with the smallest validation error among the nodes with that number of variables. However, the path is broken at each point where the model with one less variable cannot be constructed by removing one variable from the bigger model (is not nested). The vector contains the lengths of the pieces. Its length is the number of breaks plus one.

nested.path

a numeric vector containing the indices (column numbers) of the input variables in their removal order on the “nested path”. The first element is the index of the variable that was removed first. The remaining variable is the last element. If the path does not exist, this is NULL. See ‘Details’ in plotSelected.sisal.

nested.rank

an integer vector of ranks determined by nested.path. Element number i is the rank of variable (column) number i in X. NULL if nested.path is NULL. See ‘Details’ in plotSelected.sisal.

branching.useful

If branching is enabled (hbranches > 1), this is a logical vector of length d. If the i:th element is TRUE, branching improved the best model with i variables in terms of validation error. The result is NA if a comparison is not possible (may happen if pruning.keep.best is FALSE). If branching is not used, this is NULL.

warnings

warnings stored. A list of objects that evaluate to a character string.

n.warn

number of warnings produced. May be higher than number of warnings stored.

Author(s)

Mikko Korpela

References

Tikka, J. and Hollmén, J. (2008) Sequential input selection algorithm for long-term prediction of time series. Neurocomputing, 71(13–15):2604–2615.

See Also

See magic for information about the algorithm used for estimating the regularization parameter and the corresponding linear model when use.magic is TRUE.

See summary.sisal for how to extract information from the returned object.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
library(stats)
set.seed(123)
X <- cbind(sine=sin((1:100)/5),
           linear=seq(from=-1, to=1, length.out=100),
           matrix(rnorm(800), 100, 8,
                  dimnames=list(NULL, paste("random", 1:8, sep="."))))
y <- drop(X %*% c(3, 10, 1, rep(0, 7)) + rnorm(100))
foo <- sisal(X, y, Mtimes=10, kfold=5)
print(foo)           # selected inputs "L.v" are same as
summary(foo$lm.full) # significant coefficients of full model

Example output

pruning.criterion=round robin, Mtimes=10, kfold=5, hbranches=1, max.width=1, 
q=0.165, standardize=TRUE, pruning.keep.best=TRUE, pruning.reverse=FALSE, 
use.ridge=FALSE, sp=-1
There are 10 variables and 100 samples.
There are no missing values.
================================================================================sisal (no branching, OLS regression) results

Parameters:
pruning.criterion=round robin, Mtimes=10, kfold=5, hbranches=1, max.width=1, 
q=0.165, standardize=TRUE, pruning.keep.best=TRUE, pruning.reverse=FALSE, 
use.ridge=FALSE, sp=-1

Data dimensions:
inputs: 10, samples: 100

Input names are present:
 [1] "sine"     "linear"   "random.1" "random.2" "random.3" "random.4"
 [7] "random.5" "random.6" "random.7" "random.8"

Selected inputs (4), model with smallest validation error (L.v):
    sine   linear random.1 random.7 
       1        2        3        9 

Selected inputs (3), least complex model with error inside threshold (L.f):
    sine   linear random.1 
       1        2        3 

L.f is a subset of L.v

Removal order of input variables (ends with remaining input):
random.8 random.2 random.6 random.5 random.3 random.4 random.7 random.1 
      10        4        8        7        5        6        9        3 
    sine   linear 
       1        2 
Ranking based on the removal order:
    sine   linear random.1 random.2 random.3 random.4 random.5 random.6 
       2        1        3        9        6        5        7        8 
random.7 random.8 
       4       10 

Call:
lm(formula = lmForm, data = as.data.frame(X.complete), na.action = na.exclude)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.35531 -0.10909 -0.01372  0.10552  0.43282 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept) -3.545e-16  1.692e-02   0.000    1.000    
sine         3.525e-01  1.754e-02  20.095  < 2e-16 ***
linear       9.542e-01  1.778e-02  53.681  < 2e-16 ***
random.1     1.325e-01  1.796e-02   7.378 8.06e-11 ***
random.2    -8.952e-03  1.775e-02  -0.504    0.615    
random.3    -1.431e-02  1.745e-02  -0.820    0.415    
random.4    -2.832e-02  1.754e-02  -1.614    0.110    
random.5    -1.487e-02  1.857e-02  -0.801    0.425    
random.6     1.238e-02  1.803e-02   0.687    0.494    
random.7    -2.192e-02  1.719e-02  -1.275    0.206    
random.8     1.265e-04  1.816e-02   0.007    0.994    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.1692 on 89 degrees of freedom
Multiple R-squared:  0.9743,	Adjusted R-squared:  0.9714 
F-statistic: 336.8 on 10 and 89 DF,  p-value: < 2.2e-16

sisal documentation built on Feb. 16, 2020, 1:07 a.m.