compute_data_summaries: Compute Data Summaries

Description Usage Arguments Details Value Note Examples

View source: R/02_estimate_parameters.R

Description

This function computes cell-wise dropout rates and library sizes on the unnormalized data before computing the gene-wise grand means, gene-wise dropout rates, inter-individual variance, and intra-individual variance on the normalized data.

Usage

1
compute_data_summaries(expr, type = "Norm")

Arguments

expr

a data.frame that has been output by filter_counts where the unique cell identifier is in column one and the sample identifier is in column two with the remaining columns all being genes.

type

an identifier for the type of data being submitted. If it is raw counts put "Raw", if it is TPM or some other normalized counts per million then type "PerMillion" or "Norm". The program assumes data is in one of these two formats. Other normalizations (i.e., logs) that have negative values will cause the program to malfunction.

Details

Prior to estimating the data summaries, it is important to run the filter_counts function to build a data.frame that is in the right format for the following estimation functions to properly compute.

Value

A data.frame of the summary data as well as two vectors for the cell-wise dropout rates and library sizes. The data.frame includes the gene-wise grand means, inter-individual standard deviations, intra-individual standard deviations, and dropout rates.

Note

Data should be only for cells of the specific cell-type you are interested in simulating or computing power for. Data should also contain as many unique sample identifiers as possible. If you are inputing data that has less than 5 unique values for sample identifier (i.e., independent experimental units), then the empirical estimation of the inter-individual heterogeneity is going to be very unstable. Finding such a dataset will be difficult at this time, but, over time (as experiments grow in sample size and the numbers of publically available single-cell RNAseq datasets increase), this should improve dramatically.

Examples

1
2
clean_expr_data <- filter_counts()
data_summaries <- compute_data_summaries(clean_expr_data, type = "Norm")

kdzimm/hierarchicell documentation built on Dec. 21, 2021, 5:23 a.m.