filter_counts: Filter gene expression matrix

Description Usage Arguments Details Value Note Examples

View source: R/01_filter_count_matrix.R

Description

This function loads and filters the input data for the subsequent steps. The data loading and cleaning function is very basic, but data input is critical to the package working correctly. If no input data is given, the package defaults to using normal mucosal cells data for the simulation and power calculations (see alpha_cells).

Usage

1
filter_counts(expr = alpha_cells, gene_thresh = 0, cell_thresh = 0)

Arguments

expr

a data.frame where the unique cell identifier is in column one and the sample identifier is in column two with the remaining columns all being genes.

gene_thresh

the mean expression threshold for retaining genes. Defaults to 0.

cell_thresh

the mean expression threshold for retaining cells. Defaults to 0.

Details

Input data should be formatted as follows:

Cell_ID Individual_ID Gene1 Gene2 Gene3 ...
Cell1_Ind1 Ind1 12 24 0 ...
Cell2_Ind1 Ind1 11 2 0 ...
Cell3_Ind1 Ind1 10 0 0 ...
Cell4_Ind1 Ind1 0 124 10 ...
Cell1_Ind2 Ind2 9 37 18 ...
Cell2_Ind2 Ind2 0 29 0 ...

Where the unique cell identifier is in column one and the sample identifier is in column two with the remaining columns all being genes.

Value

a data.frame that has filtered out cells with mean count = 0 and genes with mean count = 0

Note

Data should be only for cells of the specific cell-type you are interested in simulating or computing power for. Data should also contain as many unique sample identifiers as possible. If you are inputing data that has less than 5 unique values for sample identifier (i.e., independent experimental units), then the empirical estimation of the inter-individual heterogeneity is going to be very unstable. Finding such a dataset will be difficult at this time, but, over time (as experiments grow in sample size and the numbers of publically available single-cell RNAseq datasets increase), this should improve dramatically.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
n_genes <- 10
n_cells <- 10

make_data <- function(x){
 mu_random <- round(rgamma(n=1, shape=1, rate=0.001),0)
 size_random <- runif(n=1, min=0, max=3)
 rnbinom(n_cells, size=size_random, mu=mu_random)
}

expr_dat <- as.data.frame(replicate(n_genes,make_data()))
expr_dat$CellID <- paste0("Cell",1:n_cells)
expr_dat$IND <- "IND1"
expr_dat <- expr_dat[,c(11,12,1:10)]
clean_expr_data <- filter_counts(expr_dat)

kdzimm/hierarchicell documentation built on Dec. 21, 2021, 5:23 a.m.