map_assays | R Documentation |
Aling and integrate spatial assay from the same modality using super pixels
map_assays(
seed_assay,
query_assay,
signal = "variable_features",
use_cost = c("feature", "niche"),
method = "pearson",
neighborhood = "knn",
k = 20,
radius = 0.05,
depth = 1,
dimensions = seq(1, 30),
batch_size = 10000,
epochs = 1,
allow_duplicates = TRUE,
threshold = 0.3,
filter_cells = FALSE,
use_norm = "raw",
scale = FALSE,
custom_cost = NULL,
seed_territory_labels = "Territory",
query_territory_labels = "Territory",
seed_meta_labels = NULL,
query_meta_labels = NULL,
jitter = 0,
digits = 5,
verbose = TRUE
)
seed_assay |
vesalius_assay object - data to be mapped to |
query_assay |
vesalius_assay objecy - data to map |
signal |
character (variable_features, all_features, embeddings, custom) - What should be used as cell signal to generate the cost matrix. Seed details |
use_cost |
character string defining how should total cost be computer Available: feature, niche, territory, composition (See details for combinations and custom matrices) |
neighborhood |
character - how should the neighborhood be selected? "knn", "radius", "graph"(See details) |
k |
int ]2, n_points] number of neareset neighbors to be considered for neighborhodd computation. |
radius |
numeric ]0,1[ proportion of max distance between points to consider for the neighborhood |
depth |
int [1, NA] graph depth from cell to consider from neighborhood (See details) |
dimensions |
Int vector containing latent space dimensions to use |
batch_size |
number of points per batch in query during assignment problem solving |
threshold |
score threshold below which indicices should be removed. Scores will always be between 0 and 1 |
use_norm |
character - which count data to use |
scale |
logical - should signal be scaled |
custom_cost |
matrix - matrix of size n (query cells) by p (seed cells) containing custom cost matrix. Used instead of vesalius cost matrix |
verbose |
logical - should I be a noisy boy? |
The goal is to assign the best matching point between a seed set and a query set.
To do so, map_assays
will first extract a
biological signal. This can be latent space embeddings per cell, or by using
gene counts (or any other modality).
If using gene counts, there are a few more options available to you. First, you can select "variable_features" and vesalius will find the intersection between the variable features in your seed_assay and your query_assay. "all_features" will find the intersection of all genes across assays (even if they are not highly variable). Finally, you can also select a custom gene vector, containing only the gene set you are interested in.
The second step is to create a cost matrix. The creation of a cost matrix is achieved by pair-wise sum of various cost matrices. By default, the map_assays function will use "feature" and "niche" cost matrices. The feature matrix computes the pearson correlation between the seed and query using which ever signal was defined by the signal argument (variable_features) will compute the correlation between shared variable features in seed and query). The niche matrix will be computed by using the pearson correlation between niche expression profiles (based on signal). Niche are defined using the neighborhood argument where knn represent the k nearest neighbors algorithm (with k defining the number of nearest neighbors), depth represents the graph depth of a local neighborhood graph, and radius defining a spatial radius surrunding a center cell. The singal (expression or embedding) is average across all cells in the niche. The territory matrix will compare the average signal of vesalius territories between seed and query. The composition matrix will compute a frequency aware jaccard index between cell types present in a niche. Cell types must be assigned to seed and query vesalius objects (See add_cells function) Total cost matrix will be computed by computing the pairwise sum of the complement (1 - p ) of each cost matrix.
This cost matrix is then parsed to a Kuhn–Munkres algorithm that will generate point pairs that minimize the overall cost.
Since the algorithm complexity is O(n3), it can be time consuming to to run on larger data sets. As such, mapping will be approximated by dividing seed and query into batches defined by batch size. For an exact mapping ensure that batch_size is larger than the number of cells in both query and seed.
Finaly once the matches are found, the coordinates are mapped to its corresponding point and a new object is returned.
vesalius_assay
## Not run:
data(vesalius)
# Create Vesalius object for processing
vesalius <- build_vesalius_assay(coordinates, counts)
jitter_ves <- build_vesalius_assay(jitter_coord, jitter_counts)
mapped <- map_assays(vesalius, jitter_ves)
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.