cursus: Genome-wide analysis

Description Usage Arguments Details Value References See Also Examples

View source: R/user_functions.R

Description

This function tests for associations between gene expression or exon abundance (Y) and genetic or epigenetic alterations (X). Using the locations of genes (Yloc), and the locations of genetic or epigenetic alterations (Xloc), the expression of each gene is tested for associations with alterations on the same chromosome that are closer to the gene than a given distance (window).

Usage

1
2
3
4
5
cursus(Y, Yloc, X, Xloc, window,
        Ychr = NULL, Xchr = NULL,
        offset = NULL, group = NULL,
        perm = 1000, nodes = 2,
        phi = NULL, kind = 0.01)

Arguments

Y

RNA-Seq data: numeric matrix with q rows (genes) and n columns (samples); or a SummarizedExperiment object

Yloc

location RNA-Seq: numeric vector of length q (point location); numeric matrix with q rows and two columns (start and end locations)

X

genomic profile: numeric matrix with p rows (covariates) and n columns (samples)

Xloc

location covariates: numeric vector of length p

window

maximum distance: non-negative real number

Ychr

chromosome RNA-Seq: factor of length q

Xchr

chromosome covariates: factor of length p

offset

numeric vector of length n

group

confounding variable: factor of length n

perm

number of iterations: positive integer

nodes

number of cluster nodes for parallel computation

phi

dispersion parameters: vector of length q

kind

computation : number between 0 and 1

Details

Note that Yloc, Xloc and window must be given in the same unit, usually in base pairs. If Yloc indicates interval locations, and window is zero, then only covariates between the start and end location of the gene are of interest. Typically window is larger than one million base pairs.

If Y and X include data from a single chromosome, Ychr and Xchr are redundant. If Y or X include data from multiple chromosomes, Ychr and Xchr should be specified in order to prevent confusion between chromosomes.

For the simultaneous analysis of multiple genomic profiles X should be a list of numeric matrices with n columns (samples), Xloc a list of numeric vectors, and window a list of non-negative real numbers. If provided, Xchr should be alist of of numeric vectors.

The offset is meant to account for different libary sizes. By default the offset is calculated based on Y. Different library sizes can be ignored by setting the offset to rep(1,n).

The user can provide the confounding variable group. Note that each level of group must appear at least twice in order to allow stratified permutations.

Efficient alternatives to classical permutation (kind=1) are the method of control variates (kind=0) and permutation in chunks (0 < kind < 1) details.

Value

The function returns a dataframe, with the p-values in the first row and the test statistics in the second row.

References

A Rauschenberger, MA Jonker, MA van de Wiel, and RX Menezes (2016). "Testing for association between RNA-Seq and high-dimensional data", BMC Bioinformatics. 17:118. html pdf (open access)

RX Menezes, M Boetzer, M Sieswerda, GJB van Ommen, and JM Boer (2009). "Integrated analysis of DNA copy number and gene expression microarray data using gene sets", BMC Bioinformatics. 10:203. html pdf (open access)

See Also

The function omnibus tests for associations between an overdispersed response variable and a high-dimensional covariate set. The function proprius calculates the contributions of individual samples or covariates to the test statistic. All other function of the R package globalSeq are internal.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
# simulate high-dimensional data
n <- 30; q <- 10; p <- 100
Y <- matrix(rnbinom(q*n,mu=10,
    size=1/0.25),nrow=q,ncol=n)
X <- matrix(rnorm(p*n),nrow=p,ncol=n)
Yloc <- seq(0,1,length.out=q)
Xloc <- seq(0,1,length.out=p)
window <- 1

# hypothesis testing
cursus(Y,Yloc,X,Xloc,window)

rauschenberger/globalSeq documentation built on May 19, 2020, 4:09 a.m.