ct.prepareAnnotation: Check and optionally subset an annotation file for use in a...

View source: R/annotations.R

ct.prepareAnnotationR Documentation

Check and optionally subset an annotation file for use in a Crispr Screen


This function processes a supplied annotation object for use in a pooled screening experiment. Originally this was processed into something special, but now it essentially returns the original annotation object in which the geneSymbol column has been factorized. This is primarily used internally during a call to the ct.generateResults() function. Also performs some minor functionality checking, and ensures that the reagent identifiers are present as an 'ID' column (if absent, the row.names are used).

Valid annotations contain both 'geneID' and 'geneSymbol' columns. This is because there is often a distinction between the official gene that is being targeted and a coherent set of gRNAs that make up a testing cohort. For example, multiple sets of guides may target distinct promoters, exons, or other entities that are expected to produce distinct biological phenomena related to the gene that should be interpreted separately. For this reason, the 'geneID' column encodes the official gene designation (typically an ensembl or entrez gene identifier) while the 'geneSymbol' column contains a human-readable descriptor of the gRNA target (such as a gene symbol or promoter name). This mapping can be further expanded to incorporate mapping ambiguity via the 'ct.expandAnnotation()' function.


ct.prepareAnnotation(ann, object = NULL, controls = TRUE, throw.error = TRUE)



A data.frame containing an annotation object with gRNA-level information encoded as rows. The row.names attribute should correspond to the individual gRNAs, and it should at minimum contain columns named 'geneID' and 'geneSymbol' indicating the corresponding gRNA target gene ID and symbol, respectively.


If supplied, an object with row.names to be used to subset the supplied annotation frame for downstream analysis.


The name of a value in the geneSymbol column of ann that corresponds to nontargeting control gRNAs. May also be supplied as a logical value, in which case the function will try to identify and format nontargeting guides.


Logical indicating whether to throw an error when controls is TRUE but no nontargeting gRNAs are detected.


A new annotation data frame, usually with nontargeting controls and NA values reformatted to NoTarget (and geneID set to 'no_gid'), and the 'geneSymbol' column of ann factorized. If supplied with an object, the gRNAs not present in the object will be omitted.


Russell Bainer


es <- ct.filterReads(es)
newann <- ct.prepareAnnotation(ann, es)

RussBainer/gCrisprTools documentation built on Nov. 5, 2022, 2:35 p.m.