transfer_cell_labels: Transfer cell column data from a reference to a query...

View source: R/label_transfer.R

transfer_cell_labelsR Documentation

Transfer cell column data from a reference to a query cell_data_set.

Description

For each cell in a query cell_data_set, transfer_cell_labels finds sufficiently similar cell data in a reference cell_data_set and copies the value in the specified column to the query cell_data_set.

Usage

transfer_cell_labels(
  cds_query,
  reduction_method = c("UMAP", "PCA", "LSI"),
  ref_coldata,
  ref_column_name,
  query_column_name = ref_column_name,
  transform_models_dir = NULL,
  k = 10,
  nn_control = list(),
  top_frac_threshold = 0.5,
  top_next_ratio_threshold = 1.5,
  verbose = FALSE
)

Arguments

cds_query

the cell_data_set upon which to perform this operation

reduction_method

a string specifying the reduced dimension matrix to use for the label transfer. These are "PCA", "LSI", and "UMAP". Default is "UMAP".

ref_coldata

the reference cell_data_set colData data frame, which is obtained using the colData(cds_ref) function.

ref_column_name

a string giving the name of the reference cell_data_set column with the values to copy to the query cell_data_set.

query_column_name

a string giving the name of the query cell_data_set column to which you want the values copied. The default is ref_column_name.

transform_models_dir

a string giving the name of the transform model directory to load into the query cell_data_set. If it is NULL, use the transform models in the query cell_data_set, which requires that the reference transform models were loaded into the query cell_data_set before transfer_cell_labels is called. The default is NULL. transfer_cells_labels uses the nearest neighbor index, which must be stored in the transform model.

k

an integer giving the number of reference nearest neighbors to find. This value must be large enough to find meaningful column value fractions. See the top_frac_threshold parameter below for additional information. The default is 10.

nn_control

An optional list of parameters used to make and search the nearest neighbors indices. See the set_nn_control help for additional details. Note that if nn_control[['search_k']] is not defined, transfer_cell_labels will try to use search_k <- 2 * n_trees * k where n_trees is the value used to build the index. The default metric is cosine for reduction_methods PCA and LSI and is euclidean for reduction_method UMAP.

top_frac_threshold

a numeric value. The top fraction of reference values must be greater than top_frac_threshold in order to be transferred to the query. The top fraction is the fraction of the k neighbors with the most frequent value. The default is 0.5.

top_next_ratio_threshold

a numeric value giving the minimum value of the ratio of the counts of the most frequent to the second most frequent reference values required for transferring the reference value to the query. The default is 1.5.

verbose

a boolean controlling verbose output.

Details

transfer_cell_labels requires a nearest neighbor index made from a reference reduced dimension matrix, the reference cell data to transfer, and a query cell_data_set. The index can be made from UMAP coordinates using the build_nn_index=TRUE option in the reduce_dimensions(..., build_nn_index=TRUE) function, for example. The query cell_data_set must have been processed with the preprocess_transform and reduce_dimension_transform functions using the models created when the reference cell_data_set was processed, rather than with preprocess_cds and reduce_dimension.

The models are made when the reference cell_data_set is processed and must be saved to disk at that time using save_transform_models. The load_transform_models function loads the models into the query cell_data_set where they can be used by preprocess_transform and reduce_dimension_transform. The cells in the reference and query cell_data_sets must be similar in the sense that they map to similar reduced dimension coordinates.

When the ref_column_name values are discrete, the sufficiently most frequent value is transferred. When the values are continuous the mean of the k nearest neighbors is transferred.

In the case of discrete values, transfer_cell_labels processes each query cell as follows. It finds the k nearest neighbor cells in the reference set, and if more than top_frac_threshold fraction of them have the same value, it copies that value to the query_column_name column in the query cell_data_set. If the fraction is at or below top_frac_threshold, it checks whether the ratio of the most frequent to the second most frequent value is at least top_next_ratio_threshold, in which case it copies the value; otherwise, it sets it to NA.

Notes:

  • Monocle3 does not have an align_transform function to apply align_cds-related transforms at this time. If your data sets require batch correction, you need to co-embed them.

  • transfer_cell_labels does not check that the reference nearest neighbor index is consistent with the query matrix.

Value

an updated cell_data_set object

Examples

  ## Not run: 
     expression_matrix <- readRDS(system.file('extdata',
                                               'worm_l2/worm_l2_expression_matrix.rds',
                                               package='monocle3'))
     cell_metadata <- readRDS(system.file('extdata',
                              'worm_l2/worm_l2_coldata.rds',
                               package='monocle3'))
     gene_metadata <- readRDS(system.file('extdata',
                              'worm_l2/worm_l2_rowdata.rds',
                              package='monocle3'))

     cds <- new_cell_data_set(expression_data=expression_matrix,
                              cell_metadata=cell_metadata,
                              gene_metadata=gene_metadata)
    ncell <- nrow(colData(cds))
    cell_sample <- sample(seq(ncell), 2 * ncell / 3)
    cell_set <- seq(ncell) %in% cell_sample
    cds1 <- cds[,cell_set]
    cds1 <- preprocess_cds(cds1)
    cds1 <- reduce_dimension(cds1, build_nn_index=TRUE)
    save_transform_models(cds1, 'tm')

    cds2 <- cds[,!cell_set]
    cds2 <- load_transform_models(cds2, 'tm')
    cds2 <- preprocess_transform(cds2, 'PCA')
    cds2 <- reduce_dimension_transform(cds2)
    cds2 <- transfer_cell_labels(cds2, 'UMAP', colData(cds1), 'cao_cell_type', 'transfer_cell_type')
  
## End(Not run)


cole-trapnell-lab/monocle3 documentation built on April 7, 2024, 9:24 p.m.