RawDataCleaning | R Documentation |
These methods are to be used to clean the raw data. That is drop
any number of genes/cells that are too sparse or too present to allow
proper calibration of the COTAN
model.
We call genes that are expressed in all cells Fully-Expressed while cells that express all genes in the data are called Fully-Expressing. In case it has been made quite easy to exclude the flagged genes/cells in the user calculations.
## S4 method for signature 'COTAN'
flagNotFullyExpressedGenes(objCOTAN)
## S4 method for signature 'COTAN'
flagNotFullyExpressingCells(objCOTAN)
## S4 method for signature 'COTAN'
getFullyExpressedGenes(objCOTAN)
## S4 method for signature 'COTAN'
getFullyExpressingCells(objCOTAN)
## S4 method for signature 'COTAN'
findFullyExpressedGenes(objCOTAN, cellsThreshold = 0.99)
## S4 method for signature 'COTAN'
findFullyExpressingCells(objCOTAN, genesThreshold = 0.99)
## S4 method for signature 'COTAN'
dropGenesCells(
objCOTAN,
genes = vector(mode = "character"),
cells = vector(mode = "character")
)
ECDPlot(objCOTAN, yCut = NaN, condName = "", conditions = NULL)
## S4 method for signature 'COTAN'
clean(
objCOTAN,
cellsCutoff = 0.003,
genesCutoff = 0.002,
cellsThreshold = 0.99,
genesThreshold = 0.99
)
cleanPlots(objCOTAN, includePCA = TRUE)
cellSizePlot(objCOTAN, condName = "", conditions = NULL)
genesSizePlot(objCOTAN, condName = "", conditions = NULL)
mitochondrialPercentagePlot(
objCOTAN,
genePrefix = "^MT-",
condName = "",
conditions = NULL
)
scatterPlot(objCOTAN, condName = "", conditions = NULL, splitSamples = TRUE)
objCOTAN |
a |
cellsThreshold |
any gene that is expressed in more cells than threshold
times the total number of cells will be marked as fully-expressed.
Default threshold is |
genesThreshold |
any cell that is expressing more genes than threshold
times the total number of genes will be marked as fully-expressing.
Default threshold is |
genes |
an array of gene names |
cells |
an array of cell names |
yCut |
y threshold of library size to drop. Default is |
condName |
The name of a condition in the |
conditions |
The conditions to use. If given it will take precedence
on the one indicated by |
cellsCutoff |
|
genesCutoff |
|
includePCA |
a |
genePrefix |
Prefix for the mitochondrial genes (default "^MT-" for Human, mouse "^mt-") |
splitSamples |
Boolean. Whether to plot each sample in a different panel
(default |
flagNotFullyExpressedGenes()
returns a Boolean array with TRUE for
those genes that are not fully-expressed.
flagNotFullyExpressingCells()
returns a Boolean vector with TRUE
for those cells that are not expressing all genes
getFullyExpressedGenes()
returns the genes expressed in all cells
of the dataset
getFullyExpressingCells()
returns the cells that did express
all genes of the dataset
findFullyExpressedGenes()
determines the fully-expressed genes
inside the raw data
findFullyExpressingCells()
determines the cells that are
expressing all genes in the dataset
dropGenesCells()
removes an array of genes and/or cells from the
current COTAN
object.
ECDPlot()
plots the empirical distribution function of library
sizes (UMI number). It helps to define where to drop "cells" that are
simple background signal.
clean()
is the main method that can be used to check and clean the
dataset. It will discard any genes that has less than 3 non-zero counts per
thousand cells and all cells expressing less than 2 per thousand genes. It
also produces and stores the estimators for nu and lambda
cleanPlots()
creates the plots associated to the output of the
clean()
method.
cellSizePlot()
plots the raw library size for each cell and
sample.
genesSizePlot()
plots the raw gene number (reads > 0) for each
cell and sample
mitochondrialPercentagePlot()
plots the raw library size for each
cell and sample.
scatterPlot()
creates a plot that check the relation between the
library size and the number of genes detected.
flagNotFullyExpressedGenes()
returns a Booleans array with TRUE
for genes that are not fully-expressed
flagNotFullyExpressingCells()
returns an array of Booleans with
TRUE for cells that are not expressing all genes
getFullyExpressedGenes()
returns an array containing all genes
that are expressed in all cells
getFullyExpressingCells()
returns an array containing all cells
that express all genes
findFullyExpressedGenes()
returns the given COTAN
object with
updated fully-expressed genes' information
findFullyExpressingCells()
returns the given COTAN
object with
updated fully-expressing cells' information
dropGenesCells()
returns a completely new COTAN
object with the
new raw data obtained after the indicated genes/cells were expunged. All
remaining data is dropped too as no more relevant with the restricted
matrix. Exceptions are:
the meta-data for the data-set that gets kept unchanged
the meta-data of genes/cells that gets restricted to the remaining
elements. The columns calculated via estimate
and find
methods are
dropped too
ECDPlot()
returns an ECD plot
clean()
returns the updated COTAN
object
cleanPlots()
returns a list
of ggplot2
plots:
"pcaCells"
is for pca cells
"pcaCellsData"
is the data of the pca cells (can be plotted)
"genes"
is for B
group cells' genes
"UDE"
is for cells' UDE against their pca
"nu"
is for cell nu
"zoomedNu"
is the same but zoomed on the left and with an estimate
for the low nu threshold that defines problematic cells
cellSizePlot()
returns the violin-boxplot
plot
genesSizePlot()
returns the violin-boxplot
plot
mitochondrialPercentagePlot()
returns a list
with:
"plot"
a violin-boxplot
object
"sizes"
a sizes data.frame
scatterPlot()
returns the scatter plot
library(zeallot)
data("test.dataset")
objCOTAN <- COTAN(raw = test.dataset)
genes.to.rem <- getGenes(objCOTAN)[grep('^MT', getGenes(objCOTAN))]
cells.to.rem <- getCells(objCOTAN)[which(getCellsSize(objCOTAN) == 0)]
objCOTAN <- dropGenesCells(objCOTAN, genes.to.rem, cells.to.rem)
objCOTAN <- clean(objCOTAN)
objCOTAN <- findFullyExpressedGenes(objCOTAN)
goodPos <- flagNotFullyExpressedGenes(objCOTAN)
objCOTAN <- findFullyExpressingCells(objCOTAN)
goodPos <- flagNotFullyExpressingCells(objCOTAN)
feGenes <- getFullyExpressedGenes(objCOTAN)
feCells <- getFullyExpressingCells(objCOTAN)
## These plots might help to identify genes/cells that need to be dropped
ecdPlot <- ECDPlot(objCOTAN, yCut = 100.0)
plot(ecdPlot)
# This creates many infomative plots useful to determine whether
# there is still something to drop...
# Here we use the tuple-like assignment feature of the `zeallot` package
c(pcaCellsPlot, ., genesPlot, UDEPlot, ., zNuPlot) %<-% cleanPlots(objCOTAN)
plot(pcaCellsPlot)
plot(UDEPlot)
plot(zNuPlot)
lsPlot <- cellSizePlot(objCOTAN)
plot(lsPlot)
gsPlot <- genesSizePlot(objCOTAN)
plot(gsPlot)
mitPercPlot <-
mitochondrialPercentagePlot(objCOTAN, genePrefix = "g-0000")[["plot"]]
plot(mitPercPlot)
scPlot <- scatterPlot(objCOTAN)
plot(scPlot)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.