SPEAR: SPEAR - Sparse Supervised Bayesian Factor Model for Multi-omic Analyses

'Quick SPEAR Walkthrough'

This vignette will provide a brief walkthrough for using SPEAR on multi-omic data.

Installing SPEAR
Preparing SPEAR
Running SPEAR
Analyzing SPEAR results

Follow installation instructions in README (located here)

Required Libraries:

library(SPEAR)

Loading your own multi-omic data:

Parameters:

X - a list of matrices of omics data [subjects = rows, features = columns] (i.e. OmicsData 1 would be X[[1]])

Y - a matrix of response data [subjects = rows, response.variables = columns] (i.e. Response1 would be Y[,1])

Data should be scaled/standardized (mean = 0, variance = 1) before using SPEAR:

X <- ... # your list of matrices here
Y <- ... # your response matrix.

It is a good practice to ensure that all the rownames (samples) for X[[1]]…X[[D]] and Y are declared and are matching (to ensure rows correspond to the same ordering of samples).

Also, for each dataset in X, try to name the column names (features) to ensure they are unique.

This step is not required, but SPEAR will be more likely to return a lower weight model (i.e. w.x = 0, see below for more information) if a minimum number of factors is not provided.

Run the estimate_num_factors function to get an idea of how many factors are needed for SPEAR:

num.factors <- estimate_num_factors(X, # the list of omics data matrices (or a concatenated list, N x P, samples X features)
                                    num.iterations = 10, # how many iterations to run? Defaults to 10
                                    min.num.factors = NULL, # manually set the minimum number of factors? Useful if min > max.
                                    max.num.factors = NULL, # manually set the maximum number of factors?
                                    seed = 123 # seed for reproducible runs
                                   )

SPEARobj <- make.SPEARobject(
                               # data:
                             X = X, # list of omics matrices, see above
                             Y = Y, # matrix with response data, see above
                               # options:
                             family = "gaussian", # family, can be 'gaussian', 'binomial', 'multinomial' or 'ordinal'.
                             num.factors = num.factors, # from above, estimate_num_factors, defaults to 5
                             seed = ..., # set the seed (for consistent runs)
                             print_out = 50, # Print ELBO change after how many iterations? (default is 100)
                             remove.formatting = FALSE, # if TRUE, removes coloring from output (doesn't work in HTML)
                             quiet = FALSE # if TRUE, will silence all unnecessary print messages
                            )

When you initialize a SPEARobject, the $fit list will be NULL… This is where SPEAR will store results from training (see below)

names(SPEARobj$fit)

NULL

Some parameters are automatically initialized if you don’t specify them in the make.SPEARobject function…

SPEARobj$params$num_factors

5

This includes the $params$weights argument, which is a matrix. Each row corresponds to a w.idx (weight index), which will affect how SPEAR reconstructs latent factors.

SPEARobj$params$weights

# Yields...
    w.x w.y
 1 2.00   1
 2 1.00   1
 3 0.50   1
 4 0.10   1
 5 0.01   1
 6 0.00   1

In the current implementation of SPEAR, w.y will always be 1. w.x represents the w parameter described in the manuscript, referring to the influence the structure of X will play on the construction of the latent factors (U).

Cross validated SPEAR uses the parallel package to efficiently produce results from K-fold cross validation (defaults to 5…)

User can specify the following parameters…

num.folds - how many folds for the K-fold cross validation to use? Defaults to 5, but smaller sample sizes could benefit from higher numbers (i.e. 10)
fold.ids - how to assign each sample (row) per fold? Defaults to random (balanced, see ?generate.fold.ids), but can be specified via a vector of integers as well (i.e. fold.ids = c(1, 1, 1, 1, 2, 2, 2, 2, ... 10), but length of fold.ids must be num.samples!)

If working properly, you should see output similar to that below…

# First, run cv spear:
SPEARobj$run.cv.spear(num.folds = ..., # how many folds? Defaults to 5
                      fold.ids = ...   # How should each sample (row) be assigned? Defaults to randomly
                      )

NOTE: By default, CV SPEAR only prints out the results from one fold of the cross validation process. Due to the parallel processing, the folds are all running simultaneously.

When finished, the CV runs will automatically be evaluated to adaptively determine the most predictive factors from varying influences of X. (You can opt to manually run this via setting do.cv.eval = FALSE in the run.cv.spear function and then using the $cv.evaluate() function…). This process performs many important calculations, including getting the mean cross validation error per weight.

Finally, for reproducibility purposes, save the SPEARobject via…

SPEARobj$save.model("_name_to_save_model_.rds")

… which can be easily reloaded with …

SPEARobj <- load.SPEARobject(file = "_name_to_save_model_.rds")
# or simply readRDS(...)

NOTE: All of the above functions can be chained due to the nature of R6 objects:

SPEARobj$run.cv.spear(...)$save.model(...)

When running SPEAR, we return a model for each w.idx (weight index).

Typically, the user will opt for the "sd" (default) model, but all options below are viable:

You can plot the cross validation results with the $plot.cv.loss() function:

SPEARobj$plot.cv.loss()

For this particular SPEARobject:

"min" model is w.idx3 (w.x = 0.5)
"sd" model is w.idx1 (w.x = 2.0)

We can check the similarity in factor scores via the plot.factor.correlations() function:

SPEARobj$plot.factor.correlations()

By default, the $set.weights() function will choose the "sd" model (to mediate interpretability and prediction).

# Choose the model with the lowest error
SPEARobj$set.weights()

Setting current weight index to 1
w.x: 2
w.y: 1

If you have test data, you can add it to the SPEARobject using the add.data(...) function…

While providing a list of matrices for X is required, Y is optional.

NOTE: The column names (features) in X must match the column names of the training data…

X.test <- ...
Y.test <- ...

SPEARobj$add.data(X = X.test, Y = Y.test, name = "test")

## Saved dataset to $data$test

The dataset has been stored under $data…

names(SPEARobj$data)

## [1] "train" "test"

Many of the functions below support choosing which dataset to use.

$plot.factor.scores() will plot the factor scores for each subject against the response:

SPEARobj$plot.factor.scores()

Notice that Factor 1 is negatively correlated with the response, and Factor 1 is positively correlated with the response. We will look into these factors more deeply.

If you have a test dataset stored, you can also choose to plot the test dataset factor scores:

SPEARobj$plot.factor.scores(data = "test")

$plot.contributions() will give more insight into which datasets / response(s) are contributing to each factor.

SPEARobj$plot.contributions()

As seen above in the $plot.factor.scores() function, Factor 1 and Factor 2 are very influential in predicting the response, and Factor 1 is composed of features mainly from OmicsData3 and OmicsData4, whereas Factor 2 is composed of OmicsData1 and OmicsData4.

Now that we have factors we are interested in (Factor 1 and Factor 2 in this case), we can use the $get.features() function to get the most influential features per factor:

# defaults to coefficient.cutoff = 0 and probability.cutoff = .95
features <- SPEARobj$get.features() 

# To get ALL features, just set both cutoffs to 0:
all.features <- SPEARobj$get.features(coefficient.cutoff = 0,
                                      probability.cutoff = 0)

# To add columns including correlation with factor scores, use the correlation argument:
features.with.correlation <- SPEARobj$get.features(correlation = "pearson")

# Specify a factor/dataset:
features.Factor1 = SPEARobj$get.features(factors = 1)
features.Dataset1= SPEARobj$get.features(datasets = "OmicsData1")

This function returns a data.frame with each row representing a different feature (per factor). The features returned will depend on the coefficient.cutoff and probability.cutoff used in the function. Typically, using default parameters is fine (joint.probability >= 0.95 and projection.coefficient >= 0), but feel free to adjust these to get a larger/shorter list of features.

joint.probability refers to the probability of a feature being influential in the model, so anything above 0.95 is usually very good. If too many features are returned, consider setting joint.probability = 1.0 to only get the absolute top features.

Use the $get.predictions() function to get predictions from a trained SPEARobject. Remember, the predictions will differ depending on the w.idx set through the $set.weights(...) function.

You can obtain the following types of predictions:

cross-validated predictions for training data ("train") (default parameters)
in-sample predictions for the training data ("train") (set cv = FALSE)
test predictions for novel test data (added via $add.data(...) function, see above) (set data = "_name_of_test_data_")

# defaults to cross-validated predictions for the training data
cv.predictions <- SPEARobj$get.predictions()

# Retrieve the in.sample predictions for the training data instead
in.sample.predictions <- SPEARobj$get.predictions(cv = FALSE)

# Retrieve predictions for a test dataset (added via $add.data())
test.predictions <- SPEARobj$get.predictions(data = "test")

You can plot predictions using the same parameters as the $get.predictions(...) function (i.e. data = ... and cv = ...)

# defaults to cross-validated predictions for the training data
SPEARobj$plot.predictions()

# Retrieve the in.sample predictions for the training data instead
SPEARobj$plot.predictions(cv = FALSE)

# Retrieve predictions for a test dataset (added via $add.data())
SPEARobj$plot.predictions(data = "test")

See the SPEAR plots vignette for many other plots available in the SPEAR package.

Other Vignettes

For a deeper dive into all of the above sections, return to the main SPEAR vignette here

jgygi/SPEAR documentation built on July 5, 2023, 5:35 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

jgygi/SPEAR
SPEAR - Sparse Supervised Bayesian Factor Model for Multi-omic Analyses

docs/vignettes/01_SPEAR_Quick_Walkthrough.md
In jgygi/SPEAR: SPEAR - Sparse Supervised Bayesian Factor Model for Multi-omic Analyses

'Quick SPEAR Walkthrough'

Sections:

Installing SPEAR:

Required Libraries:

Preparing SPEAR:

Loading the multi-omic data:

Estimating the minimum number of factors:

Make a SPEARobject-class object:

Running SPEAR:

Running SPEAR (with cross validation):

Analyzing SPEAR results:

Choose a model via the w.idx (weight index)

Adding a testing dataset

Factor Scores

Factor Contributions

Getting Top Features:

Obtaining Predictions

Plotting Predictions

Other Plotting Functions:

Other Vignettes

R Package Documentation

Browse R Packages

We want your feedback!

jgygi/SPEAR SPEAR - Sparse Supervised Bayesian Factor Model for Multi-omic Analyses

docs/vignettes/01_SPEAR_Quick_Walkthrough.md In jgygi/SPEAR: SPEAR - Sparse Supervised Bayesian Factor Model for Multi-omic Analyses

'Quick SPEAR Walkthrough'

Sections:

Installing SPEAR:

Required Libraries:

Preparing SPEAR:

Loading the multi-omic data:

Estimating the minimum number of factors:

Make a SPEARobject-class object:

Running SPEAR:

Running SPEAR (with cross validation):

Analyzing SPEAR results:

Choose a model via the w.idx (weight index)

Adding a testing dataset

Factor Scores

Factor Contributions

Getting Top Features:

Obtaining Predictions

Plotting Predictions

Other Plotting Functions:

Other Vignettes

R Package Documentation

Browse R Packages

We want your feedback!

jgygi/SPEAR
SPEAR - Sparse Supervised Bayesian Factor Model for Multi-omic Analyses

docs/vignettes/01_SPEAR_Quick_Walkthrough.md
In jgygi/SPEAR: SPEAR - Sparse Supervised Bayesian Factor Model for Multi-omic Analyses