aggregatetogenes: Aggregates pooled CRISPR screen sgRNA data to gene data

Description Usage Arguments Details Value Note Author(s) Examples

Description

Aggregate all sgRNA data from pooled CRISPR screens to their corresponding gene level.

Usage

1
2
aggregatetogenes(data.frame, namecolumn = 1, countcolumn = 2,
agg.function = sum, extractpattern = expression("^(.+?)_.+"), type="aggregate")

Arguments

data.frame

data.frame with sgRNA readcounts. Must have one column with sgRNA names and one column with readcounts. Please note that the data must be formatted in a way, that gene names are included within the sgRNA name and can be extracted using the extractpattern expression. e.g. GENE_sgRNA1 -> GENE as gene name, _ as the separator and sgRNA1 as the sgRNA identifier.

namecolumn

integer, indicates in which column the names are stored

countcolumn

integer, indicates in which column the readcount are stored

agg.function

expression, the function to be used for aggregating data. Since for sgRNAs, aggregating data to the corresponding gene, sum will be right function in this case. Other possibilities include any other mathematical function R is capable of, e.g. median, mean.

extractpattern

Regular Expression, used to extract the gene name from the sgRNA name. Please make sure that the gene name extracted is accesible by putting its regular expression in brackets (). The default value expression("^(.+?)_.+") will look for the gene name (.+?) in front of the separator _ and any character afterwards .+ e.g. gene1_anything .

type

CaRpools can either aggregate the data frame ('type = "annotate"') or annotate the gene identifiers only as an additional column ('type = "annotate"'). *Default* "aggregate" *Values* "aggregate", "annotate"

Details

aggregatetogenes can be used after load.file() to create quality control plots for aggregated gene data instead of single sgRNA data.

Before:

DesignID fullmatch
AAK1_104_0 0
AAK1_105_0 197
AAK1_106_0 271
AAK1_107_0 1
AAK1_108_0 0

Afterwards:

DesignID fullmatch
AAK1 880
AATK 2105
ABI1 1610

Value

A data.frame is returned with namecolumn (which no includes only gene names) and all readcount information aggregated by the agg.function.

Note

none

Author(s)

Jan Winter

Examples

1
2
3
4
data(caRpools)

CONTROL1.g=aggregatetogenes(data.frame = CONTROL1, agg.function=sum,
                            extractpattern = expression("^(.+?)(_.+)"))

caRpools documentation built on May 2, 2019, 11:26 a.m.