docs/vignettes/01_SPEAR_Quick_Walkthrough.md

'Quick SPEAR Walkthrough'

This vignette will provide a brief walkthrough for using SPEAR on multi-omic data.

Sections:

Installing SPEAR:

Follow installation instructions in README (located here)

Required Libraries:
library(SPEAR)

Preparing SPEAR:

Loading the multi-omic data:

Loading your own multi-omic data:

Parameters:

X - a list of matrices of omics data [subjects = rows, features = columns] (i.e. OmicsData 1 would be X[[1]])

Y - a matrix of response data [subjects = rows, response.variables = columns] (i.e. Response1 would be Y[,1])

Data should be scaled/standardized (mean = 0, variance = 1) before using SPEAR:

X <- ... # your list of matrices here
Y <- ... # your response matrix.

It is a good practice to ensure that all the rownames (samples) for X[[1]]X[[D]] and Y are declared and are matching (to ensure rows correspond to the same ordering of samples).

Also, for each dataset in X, try to name the column names (features) to ensure they are unique.

Estimating the minimum number of factors:

This step is not required, but SPEAR will be more likely to return a lower weight model (i.e. w.x = 0, see below for more information) if a minimum number of factors is not provided.

Run the estimate_num_factors function to get an idea of how many factors are needed for SPEAR:

num.factors <- estimate_num_factors(X, # the list of omics data matrices (or a concatenated list, N x P, samples X features)
                                    num.iterations = 10, # how many iterations to run? Defaults to 10
                                    min.num.factors = NULL, # manually set the minimum number of factors? Useful if min > max.
                                    max.num.factors = NULL, # manually set the maximum number of factors?
                                    seed = 123 # seed for reproducible runs
                                   )

Make a SPEARobject-class object:

SPEARobj <- make.SPEARobject(
                               # data:
                             X = X, # list of omics matrices, see above
                             Y = Y, # matrix with response data, see above
                               # options:
                             family = "gaussian", # family, can be 'gaussian', 'binomial', 'multinomial' or 'ordinal'.
                             num.factors = num.factors, # from above, estimate_num_factors, defaults to 5
                             seed = ..., # set the seed (for consistent runs)
                             print_out = 50, # Print ELBO change after how many iterations? (default is 100)
                             remove.formatting = FALSE, # if TRUE, removes coloring from output (doesn't work in HTML)
                             quiet = FALSE # if TRUE, will silence all unnecessary print messages
                            )

When you initialize a SPEARobject, the $fit list will be NULL… This is where SPEAR will store results from training (see below)

names(SPEARobj$fit)

NULL

Some parameters are automatically initialized if you don’t specify them in the make.SPEARobject function…

SPEARobj$params$num_factors

5

This includes the $params$weights argument, which is a matrix. Each row corresponds to a w.idx (weight index), which will affect how SPEAR reconstructs latent factors.

SPEARobj$params$weights

# Yields...
    w.x w.y
 1 2.00   1
 2 1.00   1
 3 0.50   1
 4 0.10   1
 5 0.01   1
 6 0.00   1

In the current implementation of SPEAR, w.y will always be 1. w.x represents the w parameter described in the manuscript, referring to the influence the structure of X will play on the construction of the latent factors (U).

Running SPEAR:

Running SPEAR (with cross validation):

Cross validated SPEAR uses the parallel package to efficiently produce results from K-fold cross validation (defaults to 5…)

User can specify the following parameters…

If working properly, you should see output similar to that below…

# First, run cv spear:
SPEARobj$run.cv.spear(num.folds = ..., # how many folds? Defaults to 5
                      fold.ids = ...   # How should each sample (row) be assigned? Defaults to randomly
                      )

NOTE: By default, CV SPEAR only prints out the results from one fold of the cross validation process. Due to the parallel processing, the folds are all running simultaneously.

When finished, the CV runs will automatically be evaluated to adaptively determine the most predictive factors from varying influences of X. (You can opt to manually run this via setting do.cv.eval = FALSE in the run.cv.spear function and then using the $cv.evaluate() function…). This process performs many important calculations, including getting the mean cross validation error per weight.

Finally, for reproducibility purposes, save the SPEARobject via…

SPEARobj$save.model("_name_to_save_model_.rds")

… which can be easily reloaded with …

SPEARobj <- load.SPEARobject(file = "_name_to_save_model_.rds")
# or simply readRDS(...)

NOTE: All of the above functions can be chained due to the nature of R6 objects:

SPEARobj$run.cv.spear(...)$save.model(...)

Analyzing SPEAR results:

Choose a model via the w.idx (weight index)

When running SPEAR, we return a model for each w.idx (weight index).

Typically, the user will opt for the "sd" (default) model, but all options below are viable:

| method | Code | Explanation | |:----------:|:------------------------------:|:--------------------------------------------------------------------------------------------------------------------:| | "min" | $set.weights(method = "min") | Choose the wx with the lowest mean CV error. | | "sd" | $set.weights(method = "sd") | Choose the model with the highest wx within 1 standard deviation of the lowest mean CV error. | | "fixed" | $set.weights(w.x = _value_) | Manually choose the model with wx = *value* (disregarding CV error entirely) |

You can plot the cross validation results with the $plot.cv.loss() function:

SPEARobj$plot.cv.loss()

For this particular SPEARobject:

We can check the similarity in factor scores via the plot.factor.correlations() function:

SPEARobj$plot.factor.correlations()

By default, the $set.weights() function will choose the "sd" model (to mediate interpretability and prediction).

# Choose the model with the lowest error
SPEARobj$set.weights()

Setting current weight index to 1
w.x: 2
w.y: 1

Adding a testing dataset

If you have test data, you can add it to the SPEARobject using the add.data(...) function…

While providing a list of matrices for X is required, Y is optional.

NOTE: The column names (features) in X must match the column names of the training data…

X.test <- ...
Y.test <- ...

SPEARobj$add.data(X = X.test, Y = Y.test, name = "test")
## Saved dataset to $data$test

The dataset has been stored under $data

names(SPEARobj$data)
## [1] "train" "test"

Many of the functions below support choosing which dataset to use.

Factor Scores

$plot.factor.scores() will plot the factor scores for each subject against the response:

SPEARobj$plot.factor.scores()

Notice that Factor 1 is negatively correlated with the response, and Factor 1 is positively correlated with the response. We will look into these factors more deeply.

If you have a test dataset stored, you can also choose to plot the test dataset factor scores:

SPEARobj$plot.factor.scores(data = "test")

Factor Contributions

$plot.contributions() will give more insight into which datasets / response(s) are contributing to each factor.

SPEARobj$plot.contributions()

As seen above in the $plot.factor.scores() function, Factor 1 and Factor 2 are very influential in predicting the response, and Factor 1 is composed of features mainly from OmicsData3 and OmicsData4, whereas Factor 2 is composed of OmicsData1 and OmicsData4.

Getting Top Features:

Now that we have factors we are interested in (Factor 1 and Factor 2 in this case), we can use the $get.features() function to get the most influential features per factor:

# defaults to coefficient.cutoff = 0 and probability.cutoff = .95
features <- SPEARobj$get.features() 

# To get ALL features, just set both cutoffs to 0:
all.features <- SPEARobj$get.features(coefficient.cutoff = 0,
                                      probability.cutoff = 0)

# To add columns including correlation with factor scores, use the correlation argument:
features.with.correlation <- SPEARobj$get.features(correlation = "pearson")

# Specify a factor/dataset:
features.Factor1 = SPEARobj$get.features(factors = 1)
features.Dataset1= SPEARobj$get.features(datasets = "OmicsData1")

This function returns a data.frame with each row representing a different feature (per factor). The features returned will depend on the coefficient.cutoff and probability.cutoff used in the function. Typically, using default parameters is fine (joint.probability >= 0.95 and projection.coefficient >= 0), but feel free to adjust these to get a larger/shorter list of features.

joint.probability refers to the probability of a feature being influential in the model, so anything above 0.95 is usually very good. If too many features are returned, consider setting joint.probability = 1.0 to only get the absolute top features.

Obtaining Predictions

Use the $get.predictions() function to get predictions from a trained SPEARobject. Remember, the predictions will differ depending on the w.idx set through the $set.weights(...) function.

You can obtain the following types of predictions:

# defaults to cross-validated predictions for the training data
cv.predictions <- SPEARobj$get.predictions()

# Retrieve the in.sample predictions for the training data instead
in.sample.predictions <- SPEARobj$get.predictions(cv = FALSE)

# Retrieve predictions for a test dataset (added via $add.data())
test.predictions <- SPEARobj$get.predictions(data = "test")

Plotting Predictions

You can plot predictions using the same parameters as the $get.predictions(...) function (i.e. data = ... and cv = ...)

# defaults to cross-validated predictions for the training data
SPEARobj$plot.predictions()

# Retrieve the in.sample predictions for the training data instead
SPEARobj$plot.predictions(cv = FALSE)

# Retrieve predictions for a test dataset (added via $add.data())
SPEARobj$plot.predictions(data = "test")

Other Plotting Functions:

See the SPEAR plots vignette for many other plots available in the SPEAR package.

Other Vignettes

For a deeper dive into all of the above sections, return to the main SPEAR vignette here



jgygi/SPEAR documentation built on July 5, 2023, 5:35 p.m.