compute_spec_table: Compute specification table for cell-type expression data

Description Usage Arguments Value References Examples

View source: R/build_spec_tools.R

Description

This function implements the Index of Cell Identity speficiation table computation algorithm developed by Idan Efroni and colleagues (Efroni, et al., 2015). This package takes advantage of the furrr::future_map utility, which enables parallelization on multicore machines. To enable this functionality, you must specify, future::plan(strategy = "multisession") command prior to executing ICITools functions.

Usage

1
2
compute_spec_table(expression_data, bin_method = "Efroni",
  mean_method = "Efroni", ...)

Arguments

expression_data

a data frame (or tibble) containing the following columns for each gene/dataset combination:

  • Locus (the gene or probeset)

  • Expression (the normalized expression value, not log-transformed)

  • Cell_Type (the cell type from which the expression data came from)

  • Sample_Name (the sample where Locus/Cell_Type Expression value was measured.)

It is assumed that the expression_data object contains no missing values. This is important, since the specificity score computation should be comparable between loci for the same cell types, which would not be the case if some loci/cell-type combinations are missing. As an initial pre-processing step, this function will remove any loci that have missing values.

bin_method

character (only implemented method is "Efroni") or user-defined method for binning expression data. The user-defined method must take in a data.frame with columns, "Cell_Type" and "Expression" for a single locus, and return a data.frame containing columns "Cell_Type", "Expression", and "bin". If bin_method is set to something other than a function or "Efroni", this function will exit with an error.

mean_method

character (only implemented methods are "Efroni" and "median") or user-defined method for computing expression mean for each locus/cell type. The user-defined method must take in a data.frame with columns, "Cell_Type" and "Expression", and return a data.frame containing columns "Cell_Type", "Expression", and "mean_expr". There should be one mean value returned for each Cell_Type/Locus combination. If mean_method is set to something other than a function or "Efroni", mean expression for each Locus/Cell_Type pair is calculated as the simple mean over all expression values measured for that combination.

...

options to bin_method and mean_method (if supplied). Also options for the Efroni binning procedure (l and u)

Value

Returns a tibble containing spec scores and "mean" expression values by Locus. Note that this method does not set negative spec scores to 0. Loci that have an unexpected expression distribution using the Efroni method (background bin is greater than the specified u paramter) will have spec_scores set to 0, however the optimum bin size is still computed, and mean expression is computed using that value.

References

Ifroni, E., Ip, PL., Nawy, T., Mello, A., Birnbaum, KD. (2015). "Quantification of cell identity from single-cell gene expression profiles". Genome Biology 16(9)

Birnbaum, KD. and Kussell, E. (2011). "Measuring cell identity in noisy biological systems". Nucl. Acids. Res. 39(21)

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
# compute specificity score on test data using Efroni method
# l and u are Efroni-specific hyperparameters that control the expected shape
# of the binned expression data; l = the number of discrete bins for each
# expression profile, and u is the maximum bin that can be considered
# "background". The expression values are then classified as expressed or not
# expressed and spec is calculated on 2 bins.
spec_scores <-
  compute_spec_table(expression_data = test_spec,
                     bin_method = "Efroni", l = 10, u = 3)

# Compute specificity score on test data adapted from Birnbaum, et al.
# (2011), using a custom binning and mean computation method. Make sure to
# supply the "..." argument to both functions unless they use exactly the
# arguments (e.g. if in the below example, ... would not be necessary in
# custom_mean if that function also used the n_bins argument.)

# Custom binning procedure
custom_bin <- function(df, n_bins, ...) {
  bins <- cut(df$Expression, n_bins, labels = FALSE)
  df$bin = bins
  return(df)
}

custom_mean <- function(df, ...) {
  means_raw <- tapply(df$Expression, df$Cell_Type, mean)
  means <- tibble::enframe(means_raw, "Cell_Type", "mean_expr")
  return(means)
}

spec_scores <-
  compute_spec_table(expression_data = test_birnbaum,
                      bin_method = custom_bin, mean_method = custom_mean,
                      n_bins = 3)

b-coli/ICITools documentation built on Dec. 27, 2021, 7:40 a.m.