Description Usage Arguments Details Value Note Author(s) See Also
Performs preprocessing of a matrix object including breaking up the dataset into objects representing the specific samples, filtering out low quality cells and lowly expressed genes and log transforming.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
x |
a numeric matrix. |
complexity_cutoff |
a numeric vector of length 2 representing the lower and upper bounds of complexity (i.e. the number of detected genes per cell). |
expression_cutoff |
a numeric representing the minimal log2 mean expression per gene below which a gene is considered lowly expressed. |
housekeeping_cutoff |
a numeric representing the log2 mean expression of house-keeping genes (i.e. genes that are highly expressed in all cells) per cell below which a cell is considered low quality. |
log_base |
a numeric representing the logarithm base for performing log transformation on the data. |
scaling_factor |
a numeric representing a scaling factor by which to divide each data point before log transformation. |
pseudo_count |
a numeric representing the pseudo count added when performing log transformation to avoid taking the log of zero. |
sample_id |
an ID of the sample being processed. Used only for printing and hence is not a mandatory parameter. Default is NULL. |
cell_ids |
a charactyer vector containing IDs of cells that already passed QC. enables bypassing the low-quality cells filtering step. |
gene_ids |
a charactyer vector containing IDs of genes that already passed QC. enables bypassing the lowly-expressed genes filtering step. |
forced_genes_set |
a vector of genes that should be included in the final processed object even if their expression is low with the exception of forced genes with absolute count equals to zero which will be filtered out. Default is NULL. |
use_housekeeping_filter |
should cells with low expression of housekeeping genes should be filtered out. Default is FALSE. |
verbose |
suppresses all messages from this function. Default is FALSE. |
This function is performs the matrix preprocessing steps and is used for preprocessing ScandalDataSet objects to allow downstream analysis.
The main steps of preprocessing are as follows:
Filtering out low quality cells (cells with low complexity) by summing-up for
each cell (column) the number of genes with count greater than zero and removing
the cells outside the complexity cutoff range configured for the specific sample
in complexity_cutoff
.
A possible step of filtering out cells with low expression of house-keeping
genes, i.e. genes that are normally highly expressed in most cells (for example,
genes that encode ribosomes). Cells with mean expression of HK genes less than the
housekeeping cutoff configured for the specific sample in housekeeping_cutoff
will be removed.
Filtering out lowly expressed genes i.e. genes with log2 mean expression less
than the expression cutoff range configured for the specific sample in
expression_cutoff
.
Log-transforming the expression data.
A processed matrix ready for downstream analysis.
The function assumes that each column represents a cell and each row represents a gene.
Avishay Spitzer
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.