The simulations here are meant to be portable, toy-examples of Tilted-CCA, with the intent to demonstrate that an installation of Tilted-CCA was successful as well as to demonstrate how it differs from the methods that seek to find the "union of information".
A successful installation of tiltedCCA
is required for this. See the
last section of this README of all the system/package dependencies used when creating these simulations. Both simulations should complete in less than 2 minutes.
Broadly speaking, Tilted-CCA is a pipeline of 6 different function calls. The procedure starts by assuming there are two data matrices, mat_1
and mat_2
, (either matrix
or dgCMatrix
) with the same number of rows (i.e., the same cells), but possibly different features. We advise making sure each row/column of either matrix has non-zero variance prior to using this pipeline.
create_multiSVD
): We compute the leading PCs of each modality here. There can be a different number of latent dimensions for each modality. Here, the additional parameters dictate how to compute the leading PCs (for example, should the features be scaled/center before or after the computing the PCs? Which PCs should be used downstream? This would be useful since for ATAC data, it is common to not use the first leading PC.)This typically looks like the following.
multiSVD_obj <- tiltedCCA::create_multiSVD(mat_1 = mat_1, mat_2 = mat_2, dims_1 = 1:2, dims_2 = 1:2, center_1 = F, center_2 = F, normalize_row = T, normalize_singular_value = F, recenter_1 = F, recenter_2 = F, rescale_1 = F, rescale_2 = F, scale_1 = F, scale_2 = F)
The output of this step is a multiSVD
object, which will continually be updated as we feed it as input and get an updated object after each of the following function calls. It currently has elements svd_1
, svd_2
, default_assay
, and param
.
form_metacells
): We pass (optional) large clustering structure or meta-cells here. The former is designed to help aid Tilted-CCA assess what the "intersection of information" is, and the latter is designed for handling large datasets with more than 10,000 cells. If no large structuring structure is needed, the practitioner can simply pass in large_clustering_1=NULL
and large_clustering_2=NULL
. If no meta-cells are needed, the practitioner can simply pass in num_metacells=NULL
.This typically looks like the following.
multiSVD_obj <- tiltedCCA::form_metacells(input_obj = multiSVD_obj, large_clustering_1 = clustering_1, large_clustering_2 = clustering_2, num_metacells = NULL)
The updated multiSVD
object will now have an additional element: metacell_obj
.
compute_snns
): We compute the SNN graphs for both modalities and the target common manifold (computed based on both modality's SNN graph). These graphs dictate how Tilted-CCA represents the information in each modality, and each node of the graph represents a cell/meta-cell. Here, the additional parameters dictate the specific details on how these graphs are constructed. The two important parameters latent_k
(i.e., the dimensionality of the graph Laplacian eigenbases, which is used when optimizing the tilt) and num_neigh
(i.e., the number of neighbors for each cell in either modality).This typically looks like the following.
multiSVD_obj <- tiltedCCA::compute_snns(input_obj = multiSVD_obj, latent_k = 2, num_neigh = 10, bool_cosine = T, bool_intersect = T, min_deg = 1)
The updated multiSVD
object will now have additional elements: snn_list
and laplacian_list
.
tiltedCCA
): We compute the CCA between both modalities and initialize the tilt across all latent dimensions to a particular value. Recall that CCA's solution can be directly computed using the PC's from each modality. If there are different number of latent dimensions for each modality, Tilted-CCA will have a latent dimensionality (i.e., via CCA) equal to the smaller of the two.multiSVD_obj <- tiltedCCA::tiltedCCA(input_obj = multiSVD_obj)
The updated multiSVD
object will now have additional elements: cca_obj
and tcca_obj
.
fine_tuning
): We tune the tilt across each latent dimension. This is the main computational bottleneck of Tilted-CCA, as an optimization is performed cyclically (i.e., the tilt for each latent dimension is updated in sequence, with a few epochs cycling through all latent dimensions). multiSVD_obj <- tiltedCCA::fine_tuning(input_obj = multiSVD_obj)
There is no new obvious element added to multiSVD
, but the representation of the common and distinct embeddings is updated upon this function's completion. For example, multiSVD_obj$tcca_obj$tilt_perc
now is a vector, denoting the tilt of the common vector for each latent dimension.
tiltedCCA_decomposition
): Given the tilts of common vectors across all latent dimensions, we now recover the full cell-by-feature decomposition of each modality. Here, the additional parameters dictate whether you want the decompositions to be cell-by-feature or cell-by-PC. The latter is desirable especially if you are analyzing ATAC data, where the cell-by-feature would be a dense matrix with many hundred-of-thousand features (which would be too memory intensive).multiSVD_obj <- tiltedCCA::tiltedCCA_decomposition(multiSVD_obj)
The updated multiSVD
object will now have additional elements: common_mat_1
and distint_mat_1
(or common_dimred_1
and distinct_dimred_1
if bool_modality_1_full=F
)
and common_mat_2
and distint_mat_2
(or common_dimred_2
and distinct_dimred_2
if bool_modality_2_full=F
)
See https://github.com/linnykos/tiltedCCA_analysis/blob/master/simulation/simulation_1.R for this simulation. In this simulation, there are 300 cells across two modalities of 10 features each. Modality 1 has a "high" amount of distinct information -- it separates the 300 cells into 3 obvious clusters. Modality 2 has a "low" amount of distinct information -- all 300 cells are in one amorphous ball.
The following shows each modality based on their leading 2 PCs respectively, where the cells are colored by the true cell-types.
knitr::include_graphics("simulation1_data.png")
The following shows Tilted-CCA's common (i.e., shared axes of variation between both modalities) and distinct (i.e., axes of variation unique to a modality, after the common axes have been accounted for) axes of variation, where the cells are colored by the true cell-types. Here, observe that the common embedding does not really contain information to separate the cell-types -- this is desirable, as all the cell-type separation information are unique to Modality 1 (hence, appearing in the Modality 1's distinct embedding). The common embedding demonstrates the "intersection of information."
knitr::include_graphics("simulation1_tcca.png")
In contrast with Tilted-CCA's "intersection of information," we demonstrate Consensus PCA, which is a prototypical method illustrating the "union of information" where a low-dimensional embedding is constructed for multimodal data by combining the leading axes of variation from each modality. These embeddings are useful to visualize the "best of both worlds" (i.e., having cell-type separation that best combines both modalities), but is not useful for to understand the shared and distinct signals between the two modalities. Here, observe that the three cell-types are clearly separated by Consensus PCA.
knitr::include_graphics("simulation1_consensuspca.png")
See https://github.com/linnykos/tiltedCCA_analysis/blob/master/simulation/simulation_2.R for this simulation. In this simulation, there are 500 cells across two modalities of 10 features each. There are 5 true cell-types, but each modality can only differentiate these cell-types into 3 clusters. Modality 1 separates the cyan cells from the red+black cells from the green+blue cells, while Modality 2 separates the cyan cells from the red+blue cells from the green+black cells.
The following shows each modality based on their leading 2 PCs respectively, where the cells are colored by the true cell-types.
knitr::include_graphics("simulation2_data.png")
The following shows Tilted-CCA's common (demonstrating the "intersection of information") and distinct axes of variation, where the cells are colored by the true cell-types. Here, observe that the common embedding shows the information that both modalities agree upon -- the cyan cells are separable from all other cell-types. The distinct information then illustrates what is distinct for each modality: Modality 1 separates the cyan cells from the red+black cells from the green+blue cells, while Modality 2 separates the cyan cells from the red+blue cells from the green+black cells.
knitr::include_graphics("simulation2_tcca.png")
Consensus PCA for this data cleanly separates all five cell-types. Again, quantifying "union of information" is a useful and complementary perspective to what Tilted-CCA does. The "union" gives perspective on what axes of variation are contained across both modalities, while Tilted-CCA's common embeddings gives perspective on what axes of variation are shared between both modalities.
knitr::include_graphics("simulation2_consensuspca.png")
The following shows the suggested package versions that the developer (GitHub username: linnykos) used when developing the Tilted-CCA package.
> devtools::session_info() ─ Session info ───────────────────────────────────────────────────── setting value version R version 4.1.2 (2021-11-01) os Red Hat Enterprise Linux system x86_64, linux-gnu ui X11 language (EN) collate en_US.UTF-8 ctype en_US.UTF-8 tz America/New_York date 2022-10-04 pandoc 1.12.3.1 @ /usr/bin/pandoc ─ Packages ──────────────────────── package * version date (UTC) lib source cachem 1.0.6 2021-08-19 [1] CRAN (R 4.1.2) callr 3.7.1 2022-07-13 [1] CRAN (R 4.1.2) cli 3.3.0 2022-04-25 [1] CRAN (R 4.1.2) crayon 1.5.1 2022-03-26 [1] CRAN (R 4.1.2) devtools 2.4.4 2022-07-20 [1] CRAN (R 4.1.2) digest 0.6.29 2021-12-01 [1] CRAN (R 4.1.2) ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.1.2) fastmap 1.1.0 2021-01-25 [1] CRAN (R 4.1.2) fs 1.5.2 2021-12-08 [1] CRAN (R 4.1.2) glue 1.6.2 2022-02-24 [1] CRAN (R 4.1.2) htmltools 0.5.3 2022-07-18 [1] CRAN (R 4.1.2) htmlwidgets 1.5.4 2021-09-08 [1] CRAN (R 4.1.2) httpuv 1.6.5 2022-01-05 [1] CRAN (R 4.1.2) later 1.3.0 2021-08-18 [1] CRAN (R 4.1.2) lifecycle 1.0.1 2021-09-24 [1] CRAN (R 4.1.2) magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.1.2) memoise 2.0.1 2021-11-26 [1] CRAN (R 4.1.2) mime 0.12 2021-09-28 [1] CRAN (R 4.1.2) miniUI 0.1.1.1 2018-05-18 [1] CRAN (R 4.1.2) pkgbuild 1.3.1 2021-12-20 [1] CRAN (R 4.1.2) pkgload 1.3.0 2022-06-27 [1] CRAN (R 4.1.2) prettyunits 1.1.1 2020-01-24 [1] CRAN (R 4.1.2) processx 3.7.0 2022-07-07 [1] CRAN (R 4.1.2) profvis 0.3.7 2020-11-02 [1] CRAN (R 4.1.2) promises 1.2.0.1 2021-02-11 [1] CRAN (R 4.1.2) ps 1.7.0 2022-04-23 [1] CRAN (R 4.1.2) purrr 0.3.4 2020-04-17 [1] CRAN (R 4.1.2) R6 2.5.1 2021-08-19 [1] CRAN (R 4.1.2) Rcpp 1.0.9 2022-07-08 [1] CRAN (R 4.1.2) remotes 2.4.2 2021-11-30 [1] CRAN (R 4.1.2) rlang 1.0.4 2022-07-12 [1] CRAN (R 4.1.2) sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.1.2) shiny 1.7.2 2022-07-19 [1] CRAN (R 4.1.2) stringi 1.7.8 2022-07-11 [1] CRAN (R 4.1.2) stringr 1.4.0 2019-02-10 [1] CRAN (R 4.1.2) tiltedCCA * 1.0.0.001 2022-09-06 [1] local urlchecker 1.0.1 2021-11-30 [1] CRAN (R 4.1.2) usethis 2.1.6 2022-05-25 [1] CRAN (R 4.1.2) xtable 1.8-4 2019-04-21 [1] CRAN (R 4.1.2)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.