simpart: Simple Partition

Description Usage Arguments Details Value Note References See Also Examples

Description

simpart partitions a d-dimensional sample space into two orthonormal subspaces: a simpledim-dimensional nearly null space and a (d-simpledim)-dimensional model space. It provides an orthonormal basis for each subspace. The nearly null space basis is defined in terms of a simplicity measure and is ordered from most simple to least simple. The model space basis is made up of leading eigenvectors of the covariance matrix and is ordered by proportion of variance explained.

Returns the result as an object of class simpart.

Usage

1
2
3
4
5
6
7
8
simpart(y, simpledim, ...)

## S3 method for class 'formula'
simpart(formula, simpledim, data = NULL, ...)

## Default S3 method:
simpart(y, simpledim, measure = c('first', 'second', 'periodic'),
        x = seq(d), cov=FALSE, reverse=rep(FALSE, d), na.action, ...)

Arguments

formula

a formula with no response variable, referring only to numeric variables.

y

a matrix or data frame that specifies the data, or a covariance matrix. Data matrix has d columns, covariance matrix is d x d.

simpledim

the dimension of the nearly null space of the covariance matrix. It is equal to d minus the dimension of the model space.

measure

a function that calculates a simplicity measure of a vector, based on a non-negative definite symmetric matrix Lambda. There are three built in simplicity measures, specified by 'first', 'second', or 'periodic' that correspond to first divided difference, second divided difference and periodic simplicity respectively. The argument measure can take a user specified function.

data

an optional data frame (or similar: see model.frame) containing the variables in the formula formula. By default the variables are taken from environment(formula).

x

a vector of independent variable values (for functional data), length equal to d, the number of columns of y. If not supplied, a sequence from 1 to d is used.

cov

a logical value. If true, then y is assumed to be a d x d covariance matrix. If false, y is assumed to be an n x d data matrix which simpart uses to calculate a d x d covariance matrix.

reverse

a logical vector of length d. If the i-th element is true, the i-th basis vector is "reversed" by multiplication by -1. Basis vectors are arranged with model basis first, then simplicity basis. If length of reverse is less than d, then the remaining entries of reverse are assumed to be false, and the corresponding basis vectors remain unchanged.

na.action

specify how missing data should be treated.

...

arguments passed to or from other methods. If x is a formula one might specify cov or reverse. If "periodic" is chosen as the measure, period is specified as a numeric. If measure is user specified, its arguments are passed here.

Details

simpart is a generic function with "formula" and "default" methods.

simpart implements a method described in Gaydos et al (2013).

When cov=FALSE, the covariance matrix is calculated using the data matrix y. The calculation uses divisor n, the number of rows of y.

Value

simpart returns a list with class "simpart" containing the following components:

model

a d x (d-simpledim) matrix with columns containing the basis of the model space, that is, containing the first (d-simpledim) eigenvectors of the covariance matrix. Basis vectors are arranged in descending order of eigenvalue, that is, in descending order of the proportion of variance explained.

simple

d x simpledim matrix with columns containing the simplicity basis of the nearly null space. Basis vectors are arranged in descending order of simplicity.

variance

list of three components:

model

variances associated with the vectors in the model basis.

simple

variances associated with the vectors in the simplicity basis of the nearly null space.

full

variances associated with eigenvectors of the covariance matrix, that is, its eigenvalues.

simplicity

list of three components:

model

simplicity values of the vectors in the model basis.

simple

eigenvalues of the vectors in the simplicity basis of the nearly null space.

full

simplicity values of the simplicity basis when simpledim=d.

call

the matched call

measure

the simplicity measure used: "first", "second", "periodic" or an user specified measure function

varperc

the percent of variance explained by the corresponding basis vector, as a list of two components:

model

percent of variance explained by the vectors in the model basis.

simple

percent of variance explained by the vectors in the simplicity basis of the nearly null space.

scores

if y is the data matrix, the scores on the basis vector loadings.

Note

The simplicity values of the simplicity basis when simpledim=d are equal to the eigenvalues of the non-negative definite matrix, Lambda, that defines the simplicity measure.

References

T.L. Gaydos, N.E. Heckman, M. Kirkpatrick, J.R. Stinchcombe, J. Schmitt, J. Kingsolver, J.S. Marron. (2013). Visualizing genetic constraints. Annals of Applied Statistics 7: 860-882.

See Also

summary.simpart, plot.simpart

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
library(prinsimp)
require(graphics)

## Caterpillar data: estimated covariance from Kingsolver et al (2004)
## Measurements are at temperatures 11, 17, 23, 29, 35, 40
data(caterpillar)

## Analyze 5 dimensional model space, 1 dimensional nearly null space
## First divided difference simplicity measure
simpart(caterpillar, simpledim=1, cov=TRUE)  # Need to specify x

simpart(caterpillar, simpledim=1,
        x=c(11, 17, 23, 29, 35, 40), cov=TRUE)

## Second divided difference simplicity measure and 3-dimensional model space
simpart(caterpillar, simpledim=3, measure="second",
        x=c(11, 17, 23, 29, 35, 40), cov=TRUE)

Example output

Call:
simpart(y = caterpillar, simpledim = 1, cov = TRUE)

Simplicity measure: first divided differences

Partition simplicity (1 simple basis):
  model 1   model 2   model 3   model 4   model 5  simple 1 
1.7722543 3.4761451 0.7018399 2.1744038 3.4579307 2.4174262 

Full space simplicity:
   full 1    full 2    full 3    full 4    full 5    full 6 
4.0000000 3.7320508 3.0000000 2.0000000 1.0000000 0.2679492 
Warning message:
In subsplit(G, d - simpledim) :
  G has negative eigenvalues, setting them to zero

Call:
simpart(y = caterpillar, simpledim = 1, x = c(11, 17, 23, 29, 

 
Call:
    35, 40), cov = TRUE)

Simplicity measure: first divided differences

Partition simplicity (1 simple basis):
 model 1  model 2  model 3  model 4  model 5 simple 1 
1.848069 3.539756 1.245756 2.471847 3.548272 2.679633 

Full space simplicity:
  full 1   full 2   full 3   full 4   full 5   full 6 
4.000000 3.773655 3.132870 2.237603 1.370816 0.818390 
Warning message:
In subsplit(G, d - simpledim) :
  G has negative eigenvalues, setting them to zero

Call:
simpart(y = caterpillar, simpledim = 3, measure = "second", x = c(11, 

 
Call:
    17, 23, 29, 35, 40), cov = TRUE)

Simplicity measure: second divided differences

Partition simplicity (3 simple basis):
    model 1     model 2     model 3    simple 1    simple 2    simple 3 
0.019258362 0.029297107 0.005174543 0.031984537 0.028851117 0.020060522 

Full space simplicity:
      full 1       full 2       full 3       full 4       full 5       full 6 
3.244612e-02 3.244612e-02 3.142300e-02 2.552581e-02 1.278515e-02 6.245005e-17 
Warning message:
In subsplit(G, d - simpledim) :
  G has negative eigenvalues, setting them to zero

prinsimp documentation built on May 2, 2019, 2:41 a.m.