preprocess | R Documentation |
preprocess the raw single-cell data
preprocess( data, clusternum = NULL, takelog = TRUE, logbase = 2, pseudocount = 1, minexpr_value = 1, minexpr_percent = 0.5, cvcutoff = 1 )
data |
The raw single_cell data, which is a numeric matrix or data.frame. Rows represent genes/features and columns represent single cells. |
clusternum |
The number of clusters for doing cluster, typically 5 percent of number of all genes. The clustering will be done after all the transformation and trimming. If NULL no clustering will be performed. |
takelog |
Logical value indicating whether to take logarithm |
logbase |
Numeric value specifiying base of logarithm |
pseudocount |
Numeric value to be added to the raw data when taking logarithm |
minexpr_value |
Numeric value specifying the minimum cutoff of log transformed (if takelog is TRUE) value |
minexpr_percent |
Numeric value specifying the lowest percentage of highly expressed cells (expression value bigger than minexpr_value) for the genes/features to be retained. |
cvcutoff |
Numeric value specifying the minimum value of coefficient of variance for the genes/features to be retained. |
This function first takes logarithm of the raw data and then filters out genes/features in which too many cells are low expressed. It also filters out genes/features with low coefficient of variance which indicates the genes/features does not contain much information. The default setting will first take log2 of the raw data after adding a pseudocount of 1. Then genes/features in which at least half of cells have expression values are greater than 1 and the coefficeints of variance across all cells are at least 1 are retained.
Matrix or data frame with the same format as the input dataset.
Zhicheng Ji, Hongkai Ji <zji4@zji4.edu>
data(lpsdata) procdata <- preprocess(lpsdata)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.