cnvEQTL | R Documentation |
Testing CNV regions for effects on the expression level of genes in defined genomic windows.
cnvEQTL(
cnvrs,
calls,
rcounts,
data,
window = "1Mbp",
multi.calls = .largest,
min.samples = 10,
min.state.freq = 3,
de.method = c("edgeR", "limma"),
padj.method = "BH",
filter.by.expr = TRUE,
verbose = FALSE
)
cnvrs |
A |
calls |
Either a |
rcounts |
A |
data |
(optional) A |
window |
Numeric or Character. Size of the genomic window in base pairs
by which each CNV region is extended up- and downstream. This determines which
genes are tested for each CNV region. Character notation is supported for
convenience such as "100kbp" (same as 100000) or "1Mbp" (same as 1000000).
Defaults to |
multi.calls |
A function. Determines how to summarize the
CN state in a CNV region when there are multiple (potentially conflicting)
calls for one sample in that region. Defaults to |
min.samples |
Integer. Minimum number of samples with at least one call overlapping the CNV region tested. Defaults to 10. See details. |
min.state.freq |
Integer. Minimun number of samples in each CNV state being tested. Defaults to 3. |
de.method |
Character. Differential expression method.
Defaults to |
padj.method |
Character. Method for adjusting p-values to multiple testing.
For available methods see the man page of the function |
filter.by.expr |
Logical. Include only genes with
sufficiently large counts in the DE analysis? If TRUE, excludes genes not
satisfying a minimum number of read counts across samples using the
|
verbose |
Logical. Display progress messages? Defaults to |
Association testing between CNV regions and RNA-seq read counts is carried out using edgeR, which applies generalized linear models (GLMs) based on the negative-binomial distribution while incorporating normalization factors for different library sizes.
In the case of only one CN state deviating from 2n for a CNV region under investigation, this reduces to the classical 2-group comparison. For more than two states (e.g. 0n, 1n, 2n), edgeR’s ANOVA-like test is applied to test all deviating groups for significant expression differences relative to 2n.
To avoid artificial effects due to low expression of a gene or insufficient
sample size in deviating groups, it is typically recommended to exclude from
the analysis (i) genes with fewer than r reads per million reads mapped
(cpm, counts per million) in the maximally expressed sample group,
and (ii) CNV regions with fewer than s samples in a group deviating from 2n.
Use the min.cpm
and min.samples
arguments, respectively.
When testing local effects (adjacent or coinciding genes of a CNV region), suitable thresholds for candidate discovery are r = 3, s = 4, and a nominal significance level of 0.05; as such effects have a clear biological indication and the number of genes tested is typically small.
For distal effects (i.e. when testing genes far away from a CNV region) more stringent thresholds such as r = 20 and s = 10 for distal effects in conjunction with multiple testing correction using a conservative adjusted significance level such as 0.01 is typically recommended (due to power considerations and to avoid detection of spurious effects).
A DataFrame
containing measures of association for
each CNV region and each gene tested in the genomic window around the CNV region.
Ludwig Geistlinger
Geistlinger et al. (2018) Widespread modulation of gene expression by copy number variation in skeletal muscle. Sci Rep, 8(1):1399.
findOverlaps
to find overlaps between sets of genomic
regions,
qreduceAssay
to summarize ragged genomic location data
in defined genomic regions,
glmQLFit
and glmQLFTest
to conduct negative
binomial generalized linear models for RNA-seq read count data.
# (1) CNV calls
states <- sample(c(0,1,3,4), 17, replace=TRUE)
calls <- GRangesList(
sample1 = GRanges( c("chr1:1-10", "chr2:15-18", "chr2:25-34"), state=states[1:3]),
sample2 = GRanges( c("chr1:1-10", "chr2:11-18" , "chr2:25-36"), state=states[4:6] ),
sample3 = GRanges( c("chr1:2-11", "chr2:14-18", "chr2:26-36"), state=states[7:9] ),
sample4 = GRanges( c("chr1:1-12", "chr2:18-35" ), state=states[10:11] ),
sample5 = GRanges( c("chr1:1-12", "chr2:11-17" , "chr2:26-34"), state=states[12:14] ) ,
sample6 = GRanges( c("chr1:1-12", "chr2:12-18" , "chr2:25-35"), state=states[15:17] )
)
# (2) summarized CNV regions
cnvrs <- populationRanges(calls, density=0.1)
# (3) RNA-seq read counts
genes <- GRanges(c("chr1:2-9", "chr1:100-150", "chr1:200-300",
"chr2:16-17", "chr2:100-150", "chr2:200-300", "chr2:26-33"))
y <- matrix(rnbinom(42,size=1,mu=10),7,6)
names(genes) <- rownames(y) <- paste0("gene", 1:7)
colnames(y) <- paste0("sample", 1:6)
library(SummarizedExperiment)
rse <- SummarizedExperiment(assays=list(counts=y), rowRanges=granges(genes))
# (4) perform the association analysis
res <- cnvEQTL(cnvrs, calls, rse,
min.samples = 1, min.state.freq = 1, filter.by.expr = FALSE)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.