cursus: Genome-wide analysis
In rauschenberger/globalSeq: Global Test for Counts

Description Usage Arguments Details Value References See Also Examples

View source: R/user_functions.R

This function tests for associations between gene expression or exon abundance (Y) and genetic or epigenetic alterations (X). Using the locations of genes (Yloc), and the locations of genetic or epigenetic alterations (Xloc), the expression of each gene is tested for associations with alterations on the same chromosome that are closer to the gene than a given distance (window).

cursus(Y, Yloc, X, Xloc, window,
        Ychr = NULL, Xchr = NULL,
        offset = NULL, group = NULL,
        perm = 1000, nodes = 2,
        phi = NULL, kind = 0.01)

`Y`	RNA-Seq data: numeric matrix with `q` rows (genes) and `n` columns (samples); or a SummarizedExperiment object
`Yloc`	location RNA-Seq: numeric vector of length `q` (point location); numeric matrix with `q` rows and two columns (start and end locations)
`X`	genomic profile: numeric matrix with `p` rows (covariates) and `n` columns (samples)
`Xloc`	location covariates: numeric vector of length `p`
`window`	maximum distance: non-negative real number
`Ychr`	chromosome RNA-Seq: factor of length `q`
`Xchr`	chromosome covariates: factor of length `p`
`offset`	numeric vector of length `n`
`group`	confounding variable: factor of length `n`
`perm`	number of iterations: positive integer
`nodes`	number of cluster nodes for parallel computation
`phi`	dispersion parameters: vector of length `q`
`kind`	computation : number between 0 and 1

Note that Yloc, Xloc and window must be given in the same unit, usually in base pairs. If Yloc indicates interval locations, and window is zero, then only covariates between the start and end location of the gene are of interest. Typically window is larger than one million base pairs.

If Y and X include data from a single chromosome, Ychr and Xchr are redundant. If Y or X include data from multiple chromosomes, Ychr and Xchr should be specified in order to prevent confusion between chromosomes.

For the simultaneous analysis of multiple genomic profiles X should be a list of numeric matrices with n columns (samples), Xloc a list of numeric vectors, and window a list of non-negative real numbers. If provided, Xchr should be alist of of numeric vectors.

The offset is meant to account for different libary sizes. By default the offset is calculated based on Y. Different library sizes can be ignored by setting the offset to rep(1,n).

The user can provide the confounding variable group. Note that each level of group must appear at least twice in order to allow stratified permutations.

Efficient alternatives to classical permutation (kind=1) are the method of control variates (kind=0) and permutation in chunks (0 < kind < 1) details.

The function returns a dataframe, with the p-values in the first row and the test statistics in the second row.

A Rauschenberger, MA Jonker, MA van de Wiel, and RX Menezes (2016). "Testing for association between RNA-Seq and high-dimensional data", BMC Bioinformatics. 17:118. html pdf (open access)

RX Menezes, M Boetzer, M Sieswerda, GJB van Ommen, and JM Boer (2009). "Integrated analysis of DNA copy number and gene expression microarray data using gene sets", BMC Bioinformatics. 10:203. html pdf (open access)

The function omnibus tests for associations between an overdispersed response variable and a high-dimensional covariate set. The function proprius calculates the contributions of individual samples or covariates to the test statistic. All other function of the R package globalSeq are internal.

# simulate high-dimensional data
n <- 30; q <- 10; p <- 100
Y <- matrix(rnbinom(q*n,mu=10,
    size=1/0.25),nrow=q,ncol=n)
X <- matrix(rnorm(p*n),nrow=p,ncol=n)
Yloc <- seq(0,1,length.out=q)
Xloc <- seq(0,1,length.out=p)
window <- 1

# hypothesis testing
cursus(Y,Yloc,X,Xloc,window)