reset | R Documentation |
Implementation of the Reconstruction Set Test (RESET) method, which transforms an n-by-p input matrix X
into an n-by-m matrix of sample-level variable set scores and a length m vector of overall variable set scores. Execution of RESET involves the following sequence of steps:
If center.X=TRUE
, mean center the columns of X
. If X.test
is specified, the centering is instead
performed on just the columns of X
corresponding to each variable set.
See documentation for the X
and center.X
parameters for more details.
If scale.X=TRUE
, scale the columns of X
to have variance 1. If X.test
is specified, the scaling is instead
performed on just the columns of X
corresponding to each variable set.
See documentation for the X
and scale.X
parameters for more details.
If center.X.test=TRUE
, mean center the columns of X.test
.
See documentation for the X.test
and center.X.test
parameters for more details.
If scale.X.test=TRUE
, scale the columns of X.test
.
See documentation for the X.test
and scale.X.test
parameters for more details.
Set the reconstruction target matrix T
to X
or, if X.test
is specified, to X.test
.
Compute the norm of T
and norm of each row of T
. By default, these are the Frobenius and Euclidean norms respectively.
For each set in var.sets
, sample-level and matrix level scores are generated as follows:
Create a subset of X
called X.var.set
that only includes the columns of X
correponding to the variables
in the set.
Compute a rank k
orthonormal basis Q
for the column space of X.var.set
.
If the size of the set is less then or equal to random.threshold
, then this is computed as the top k
columns
of the Q
matrix from a column-pivoted QR decomposition of X.var.set
, otherwise, it is approximated using
a randomized algorithm implemented by randomColumnSpace
.
The reduced rank reconstruction of T
is then created as Q Q^T T
.
The original T
is subtracted from the reconstruction to represent the reconstruction error and the appropriate norm
is computed on each row and the entire error matrix.
The overall score is the log2 ratio of the norm of the original T
to the norm of the reconstruction error matrix.
The score for each sample is the log2 ratio of the norm of the corresponding row of the original T
to the norm of the same row of the reconstruction error matrix.
If per.var=TRUE
, then the overall and sample-level scores are divided by the variable set size.
reset(X, X.test, center.X=TRUE, scale.X=FALSE, center.X.test=TRUE, scale.X.test=FALSE,
var.sets, k=2, random.threshold, k.buff=0, q=0, test.dist="normal", norm.type="2",
per.var=FALSE)
X |
The n-by-p target matrix; columns represent variables and rows represent samples. |
X.test |
Matrix that will be combined with the |
center.X |
Flag which controls whether the values in |
scale.X |
Flag which controls whether the values in |
center.X.test |
Flag which controls whether the values in |
scale.X.test |
Flag which controls whether the values in |
var.sets |
List of m variable sets, each element is a vector of indices of variables in the set that correspond to columns in |
k |
Rank of reconstruction. Default to 2. Cannot be larger than the minimum variable set size. |
random.threshold |
If specified, indicates the variable set size above which a randomized reduced-rank reconstruction is used. If the variable set size is less or equal to random.threshold, then a non-random reconstruction is computed. Defaults to k and cannot be less than k. |
k.buff |
Additional dimensions used in randomized reduced-rank construction algorithm. Defaults to 0.
Values above 0 can improve the accuracy of the
randomized reconstruction at the expense of additional computational complexity. If |
q |
Number of power iterations for randomized SVD (see |
test.dist |
Distribution for non-zero elements of random test matrix used in randomized SVD algorithm. See description for |
norm.type |
The type of norm to use for computing reconstruction error. Defaults to "2" for Euclidean/Frobenius norm. Other supported option is "1" for L1 norm. |
per.var |
If true, the computed scores for each variable set are divided by the scaled variable set size to generate per-variable scores. Variable set size scaling is performed by dividing all sizes by the mean size (this will generate per-variable scores of approximately the same magnitude as the non-per-variable scores). |
A list with the following elements:
S
an n-by-m matrix of sample-level variable set scores.
v
a length m vector of overall variable set scores.
createVarSetCollection
,randomColumnSpace
# Create a collection of 5 variable sets each of size 10
var.sets = list(set1=1:10,
set2=11:20,
set3=21:30,
set4=31:40,
set5=41:50)
# Simulate a 100-by-100 matrix of random Poisson data
X = matrix(rpois(10000, lambda=1), nrow=100)
# Inflate first 10 rows for first 10 variables, i.e., the first
# 10 samples should have elevated scores for the first variable set
X[1:10,1:10] = rpois(100, lambda=5)
# Execute RESET using non-randomized basis computation
reset(X, var.sets=var.sets, k=2, random.threshold=10)
# Execute RESET with randomized basis computation
# (random.threshold will default to k value which is less
# than the size of all variable sets)
reset(X, var.sets=var.sets, k=2, k.buff=2)
# Execute RESET with non-zero k.buff
reset(X, var.sets=var.sets, k=2, k.buff=2)
# Execute RESET with non-zero q
reset(X, var.sets=var.sets, k=2, q=1)
# Execute RESET with L1 vs L2 norm
reset(X, var.sets=var.sets, k=2, norm.type="1")
# Project the X matrix onto the first 5 PCs and use that as X.test
# Scale X before calling prcomp() so that no centering or scaling
# is needed within reset()
X = scale(X)
X.test = prcomp(X,center=FALSE,scale=FALSE,retx=TRUE)$x[,1:5]
reset(X, X.test=X.test, center.X=FALSE, scale.X=FALSE,
center.X.test=FALSE, scale.X.test=FALSE, var.sets=var.sets, k=2)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.