RESET: Reconstruction Set Test (RESET)

resetR Documentation

Reconstruction Set Test (RESET)

Description

Implementation of the Reconstruction Set Test (RESET) method, which transforms an n-by-p input matrix X into an n-by-m matrix of sample-level variable set scores and a length m vector of overall variable set scores. Execution of RESET involves the following sequence of steps:

  • If center.X=TRUE, mean center the columns of X. If X.test is specified, the centering is instead performed on just the columns of X corresponding to each variable set. See documentation for the X and center.X parameters for more details.

  • If scale.X=TRUE, scale the columns of X to have variance 1. If X.test is specified, the scaling is instead performed on just the columns of X corresponding to each variable set. See documentation for the X and scale.X parameters for more details.

  • If center.X.test=TRUE, mean center the columns of X.test. See documentation for the X.test and center.X.test parameters for more details.

  • If scale.X.test=TRUE, scale the columns of X.test. See documentation for the X.test and scale.X.test parameters for more details.

  • Set the reconstruction target matrix T to X or, if X.test is specified, to X.test.

  • Compute the norm of T and norm of each row of T. By default, these are the Frobenius and Euclidean norms respectively.

  • For each set in var.sets, sample-level and matrix level scores are generated as follows:

    • Create a subset of X called X.var.set that only includes the columns of X correponding to the variables in the set.

    • Compute a rank k orthonormal basis Q for the column space of X.var.set. If the size of the set is less then or equal to random.threshold, then this is computed as the top k columns of the Q matrix from a column-pivoted QR decomposition of X.var.set, otherwise, it is approximated using a randomized algorithm implemented by randomColumnSpace.

    • The reduced rank reconstruction of T is then created as Q Q^T T.

    • The original T is subtracted from the reconstruction to represent the reconstruction error and the appropriate norm is computed on each row and the entire error matrix.

    • The overall score is the log2 ratio of the norm of the original T to the norm of the reconstruction error matrix.

    • The score for each sample is the log2 ratio of the norm of the corresponding row of the original T to the norm of the same row of the reconstruction error matrix.

    • If per.var=TRUE, then the overall and sample-level scores are divided by the variable set size.

Usage

reset(X, X.test, center.X=TRUE, scale.X=FALSE, center.X.test=TRUE, scale.X.test=FALSE, 
      var.sets, k=2, random.threshold, k.buff=0, q=0, test.dist="normal", norm.type="2",
      per.var=FALSE)

Arguments

X

The n-by-p target matrix; columns represent variables and rows represent samples.

X.test

Matrix that will be combined with the var.set variables to compute the reduced rank reconstruction. This is typically a subset or transformation of X, e.g., projection on top PCs. Reconstruction error will be measured on the variables in X.test. If not specified, the entire X matrix will be used for calculating reconstruction error.

center.X

Flag which controls whether the values in X are mean centered during execution of the algorithm. If only X is specified and center.X=TRUE, then all columns in X will be centered. If both X and X.test are specified, then centering is performed on just the columns of X contained in the specified variable sets. Mean centering is especially important for accurate performance when X.test is specified as a reduced rank representation of the X, e.g, as the projection of X onto the top principal components. However, mean centering the entire matrix X can have a dramatic impact on memory requirements if X is a large sparse matrix. In this case, a non-centered X and appropriate X.test (e.g., project onto top PCs of X) can be provided and mean centering performed on just the needed variables during execution of RESET. This "just-in-time" centering is enabled by setting center.X=TRUE and providing both X and X.test. If X has already been mean-centered (and X.test is a subset of this mean-centered matrix or computed using this mean-centered matrix), then center should be specified as FALSE.

scale.X

Flag which controls whether the values in X are are scaled to have variance 1 during execution of the algorithm. Defaults to false. If only X is specified and scale.X=TRUE, then all columns in X will be scaled. If both X and X.test are specified, then scaling is performed on just the columns of X contained in the specified variable sets.

center.X.test

Flag which controls whether the values in X.test, if specified, are mean centered during execution of the algorithm. Centering should be performed consistently for X and X.test, i.e., if center.X is true or X was previously centered, then center.X.test should te true unless X.test previously centered or generated from a centered X.

scale.X.test

Flag which controls whether the values in X.test, if specified, are scaled to have variance 1 during execution of the algorithm. Similar to centering, scaling should be performed consistently for X and X.test, i.e., if scale.X is true or X was previously scaled then scale.X.test should te true unless X.test previously scaled or generated from a scaled X.

var.sets

List of m variable sets, each element is a vector of indices of variables in the set that correspond to columns in X. If variable set information is instead available in terms of variable names, the appropriate format can be generated using createVarSetCollection.

k

Rank of reconstruction. Default to 2. Cannot be larger than the minimum variable set size.

random.threshold

If specified, indicates the variable set size above which a randomized reduced-rank reconstruction is used. If the variable set size is less or equal to random.threshold, then a non-random reconstruction is computed. Defaults to k and cannot be less than k.

k.buff

Additional dimensions used in randomized reduced-rank construction algorithm. Defaults to 0. Values above 0 can improve the accuracy of the randomized reconstruction at the expense of additional computational complexity. If k.buff=0, then the reduced rank reconstruction can be generated directly from the output of randomColumnSpace, otherwise, a reduced rank SVD must also be computed with the reconstruction based on the top k components.

q

Number of power iterations for randomized SVD (see randomSVD). Defaults to 0. Although power iterations can improve randomized SVD performance in general, it can decrease the sensitivity of the RESET method to detect mean or covariance differences.

test.dist

Distribution for non-zero elements of random test matrix used in randomized SVD algorithm. See description for test.dist parameter of randomSVD method.

norm.type

The type of norm to use for computing reconstruction error. Defaults to "2" for Euclidean/Frobenius norm. Other supported option is "1" for L1 norm.

per.var

If true, the computed scores for each variable set are divided by the scaled variable set size to generate per-variable scores. Variable set size scaling is performed by dividing all sizes by the mean size (this will generate per-variable scores of approximately the same magnitude as the non-per-variable scores).

Value

A list with the following elements:

  • S an n-by-m matrix of sample-level variable set scores.

  • v a length m vector of overall variable set scores.

See Also

createVarSetCollection,randomColumnSpace

Examples

  # Create a collection of 5 variable sets each of size 10
  var.sets = list(set1=1:10, 
                  set2=11:20,
                  set3=21:30,
                  set4=31:40,
                  set5=41:50)                  

  # Simulate a 100-by-100 matrix of random Poisson data
  X = matrix(rpois(10000, lambda=1), nrow=100)

  # Inflate first 10 rows for first 10 variables, i.e., the first
  # 10 samples should have elevated scores for the first variable set
  X[1:10,1:10] = rpois(100, lambda=5)

  # Execute RESET using non-randomized basis computation
  reset(X, var.sets=var.sets, k=2, random.threshold=10)

  # Execute RESET with randomized basis computation
  # (random.threshold will default to k value which is less
  # than the size of all variable sets)
  reset(X, var.sets=var.sets, k=2, k.buff=2)

  # Execute RESET with non-zero k.buff
  reset(X, var.sets=var.sets, k=2, k.buff=2)
  
  # Execute RESET with non-zero q
  reset(X, var.sets=var.sets, k=2, q=1)

  # Execute RESET with L1 vs L2 norm
  reset(X, var.sets=var.sets, k=2, norm.type="1")

  # Project the X matrix onto the first 5 PCs and use that as X.test
  # Scale X before calling prcomp() so that no centering or scaling
  # is needed within reset()
  X = scale(X)
  X.test = prcomp(X,center=FALSE,scale=FALSE,retx=TRUE)$x[,1:5]
  reset(X, X.test=X.test, center.X=FALSE, scale.X=FALSE, 
    center.X.test=FALSE, scale.X.test=FALSE, var.sets=var.sets, k=2)

RESET documentation built on May 29, 2024, 12:32 p.m.