This vignette will provide a brief walkthrough for using SPEAR on multi-omic data.
Installing SPEAR
Preparing SPEAR
Running SPEAR
Analyzing SPEAR results
Follow installation instructions in README (located here)
library(SPEAR)
Loading your own multi-omic data:
Parameters:
X - a list of matrices of omics data [subjects = rows, features =
columns] (i.e. OmicsData 1 would be X[[1]]
)
Y - a matrix of response data [subjects = rows, response.variables
= columns] (i.e. Response1 would be Y[,1]
)
Data should be scaled/standardized (mean = 0, variance = 1) before using SPEAR:
X <- ... # your list of matrices here
Y <- ... # your response matrix.
It is a good practice to ensure that all the rownames (samples) for
X[[1]]
…X[[D]]
and Y
are declared and are matching (to ensure rows
correspond to the same ordering of samples).
Also, for each dataset in X
, try to name the column names (features)
to ensure they are unique.
This step is not required, but SPEAR will be more likely to return a lower weight model (i.e. w.x = 0, see below for more information) if a minimum number of factors is not provided.
Run the estimate_num_factors
function to get an idea of how many
factors are needed for SPEAR:
num.factors <- estimate_num_factors(X, # the list of omics data matrices (or a concatenated list, N x P, samples X features)
num.iterations = 10, # how many iterations to run? Defaults to 10
min.num.factors = NULL, # manually set the minimum number of factors? Useful if min > max.
max.num.factors = NULL, # manually set the maximum number of factors?
seed = 123 # seed for reproducible runs
)
SPEARobj <- make.SPEARobject(
# data:
X = X, # list of omics matrices, see above
Y = Y, # matrix with response data, see above
# options:
family = "gaussian", # family, can be 'gaussian', 'binomial', 'multinomial' or 'ordinal'.
num.factors = num.factors, # from above, estimate_num_factors, defaults to 5
seed = ..., # set the seed (for consistent runs)
print_out = 50, # Print ELBO change after how many iterations? (default is 100)
remove.formatting = FALSE, # if TRUE, removes coloring from output (doesn't work in HTML)
quiet = FALSE # if TRUE, will silence all unnecessary print messages
)
When you initialize a SPEARobject, the $fit
list will be NULL
… This
is where SPEAR will store results from training (see below)
names(SPEARobj$fit)
NULL
Some parameters are automatically initialized if you don’t specify them
in the make.SPEARobject
function…
SPEARobj$params$num_factors
5
This includes the $params$weights
argument, which is a matrix. Each
row corresponds to a w.idx (weight index), which will affect how
SPEAR reconstructs latent factors.
SPEARobj$params$weights
# Yields...
w.x w.y
1 2.00 1
2 1.00 1
3 0.50 1
4 0.10 1
5 0.01 1
6 0.00 1
In the current implementation of SPEAR, w.y will always be 1. w.x represents the w parameter described in the manuscript, referring to the influence the structure of X will play on the construction of the latent factors (U).
Cross validated SPEAR uses the parallel
package to efficiently produce
results from K-fold cross validation (defaults to 5…)
User can specify the following parameters…
num.folds - how many folds for the K-fold cross validation to use? Defaults to 5, but smaller sample sizes could benefit from higher numbers (i.e. 10)
fold.ids - how to assign each sample (row) per fold? Defaults to
random (balanced, see ?generate.fold.ids
), but can be specified
via a vector of integers as well
(i.e. fold.ids = c(1, 1, 1, 1, 2, 2, 2, 2, ... 10)
, but length of
fold.ids
must be num.samples
!)
If working properly, you should see output similar to that below…
# First, run cv spear:
SPEARobj$run.cv.spear(num.folds = ..., # how many folds? Defaults to 5
fold.ids = ... # How should each sample (row) be assigned? Defaults to randomly
)
NOTE: By default, CV SPEAR only prints out the results from one fold of the cross validation process. Due to the parallel processing, the folds are all running simultaneously.
When finished, the CV runs will automatically be evaluated to adaptively
determine the most predictive factors from varying influences of X.
(You can opt to manually run this via setting do.cv.eval = FALSE
in
the run.cv.spear
function and then using the $cv.evaluate()
function…). This process performs many important calculations, including
getting the mean cross validation error per weight.
Finally, for reproducibility purposes, save the SPEARobject via…
SPEARobj$save.model("_name_to_save_model_.rds")
… which can be easily reloaded with …
SPEARobj <- load.SPEARobject(file = "_name_to_save_model_.rds")
# or simply readRDS(...)
NOTE: All of the above functions can be chained due to the nature of R6 objects:
SPEARobj$run.cv.spear(...)$save.model(...)
When running SPEAR, we return a model for each w.idx (weight index).
Typically, the user will opt for the "sd"
(default) model, but all
options below are viable:
| method | Code | Explanation |
|:----------:|:------------------------------:|:--------------------------------------------------------------------------------------------------------------------:|
| "min"
| $set.weights(method = "min")
| Choose the wx with the lowest mean CV error. |
| "sd"
| $set.weights(method = "sd")
| Choose the model with the highest wx within 1 standard deviation of the lowest mean CV error. |
| "fixed"
| $set.weights(w.x = _value_)
| Manually choose the model with wx = *value* (disregarding CV error entirely) |
You can plot the cross validation results with the $plot.cv.loss()
function:
SPEARobj$plot.cv.loss()
For this particular SPEARobject:
"min"
model is w.idx3 (w.x = 0.5)
"sd"
model is w.idx1 (w.x = 2.0)
We can check the similarity in factor scores via the
plot.factor.correlations()
function:
SPEARobj$plot.factor.correlations()
By default, the $set.weights()
function will choose the "sd"
model
(to mediate interpretability and prediction).
# Choose the model with the lowest error
SPEARobj$set.weights()
Setting current weight index to 1
w.x: 2
w.y: 1
If you have test data, you can add it to the SPEARobject using the
add.data(...)
function…
While providing a list of matrices for X
is required, Y
is optional.
NOTE: The column names (features) in X
must match the column
names of the training data…
X.test <- ...
Y.test <- ...
SPEARobj$add.data(X = X.test, Y = Y.test, name = "test")
## Saved dataset to $data$test
The dataset has been stored under $data
…
names(SPEARobj$data)
## [1] "train" "test"
Many of the functions below support choosing which dataset to use.
$plot.factor.scores()
will plot the factor scores for each subject
against the response:
SPEARobj$plot.factor.scores()
Notice that Factor 1 is negatively correlated with the response, and Factor 1 is positively correlated with the response. We will look into these factors more deeply.
If you have a test dataset stored, you can also choose to plot the test dataset factor scores:
SPEARobj$plot.factor.scores(data = "test")
$plot.contributions()
will give more insight into which datasets /
response(s) are contributing to each factor.
SPEARobj$plot.contributions()
As seen above in the $plot.factor.scores()
function, Factor 1 and
Factor 2 are very influential in predicting the response, and Factor 1
is composed of features mainly from OmicsData3 and OmicsData4, whereas
Factor 2 is composed of OmicsData1 and OmicsData4.
Now that we have factors we are interested in (Factor 1 and Factor 2 in
this case), we can use the $get.features()
function to get the most
influential features per factor:
# defaults to coefficient.cutoff = 0 and probability.cutoff = .95
features <- SPEARobj$get.features()
# To get ALL features, just set both cutoffs to 0:
all.features <- SPEARobj$get.features(coefficient.cutoff = 0,
probability.cutoff = 0)
# To add columns including correlation with factor scores, use the correlation argument:
features.with.correlation <- SPEARobj$get.features(correlation = "pearson")
# Specify a factor/dataset:
features.Factor1 = SPEARobj$get.features(factors = 1)
features.Dataset1= SPEARobj$get.features(datasets = "OmicsData1")
This function returns a data.frame with each row representing a
different feature (per factor). The features returned will depend on the
coefficient.cutoff
and probability.cutoff
used in the function.
Typically, using default parameters is fine (joint.probability >= 0.95
and projection.coefficient >= 0
), but feel free to adjust these to get
a larger/shorter list of features.
joint.probability
refers to the probability of a feature being
influential in the model, so anything above 0.95
is usually very good.
If too many features are returned, consider setting
joint.probability = 1.0
to only get the absolute top features.
Use the $get.predictions()
function to get predictions from a trained
SPEARobject. Remember, the predictions will differ depending on the
w.idx set through the $set.weights(...)
function.
You can obtain the following types of predictions:
cross-validated predictions for training data ("train"
) (default
parameters)
in-sample predictions for the training data ("train"
) (set
cv = FALSE
)
test predictions for novel test data (added via $add.data(...)
function, see above) (set data = "_name_of_test_data_"
)
# defaults to cross-validated predictions for the training data
cv.predictions <- SPEARobj$get.predictions()
# Retrieve the in.sample predictions for the training data instead
in.sample.predictions <- SPEARobj$get.predictions(cv = FALSE)
# Retrieve predictions for a test dataset (added via $add.data())
test.predictions <- SPEARobj$get.predictions(data = "test")
You can plot predictions using the same parameters as the
$get.predictions(...)
function (i.e. data = ...
and cv = ...
)
# defaults to cross-validated predictions for the training data
SPEARobj$plot.predictions()
# Retrieve the in.sample predictions for the training data instead
SPEARobj$plot.predictions(cv = FALSE)
# Retrieve predictions for a test dataset (added via $add.data())
SPEARobj$plot.predictions(data = "test")
See the SPEAR plots vignette for many other plots available in the SPEAR package.
For a deeper dive into all of the above sections, return to the main SPEAR vignette here
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.