linc-methods: Compute A Correlation Matrix of Co-expressed Coding And...

Description Usage Arguments Details Value Methods Compatibility Author(s) References See Also Examples

Description

The function linc can be considered as the main function of this package. It converts a given input object into a LINCmatrix. This process includes (I) statistical analysis and (II) correction of the input, (III) separation of coding and non-coding genes and (IV) computation of a correlation matrix.The input could be for instance a gene expression matrix. Rows correspond to genes; columns represent samples.Besides a suitable object a vector identifying the protein-coding genes is required.

Usage

1
2
3
4
5
6
7
8
9
linc(object,
     codingGenes,
     corMethod    = "spearman",
     batchGroups,
     nsv          = 1,
     rmPC,
     outlier,
     userFun,
     verbose      = TRUE)

Arguments

object

a matrix, data.frame or ExpressionSet with genes corresponding to rows, preferentially the high-variance genes in a given set

codingGenes

a logical vector with the same length of the supplied genes in object. TRUE indicates that the gene is a protein-coding one. Alternatively, codingGenes can be a vector naming the protein-coding genes.

corMethod

a method for the correlation function; has to be one of c("pearson", "kendall", "spearman", "user")

batchGroups

a vector naming the batch conditions. The length of this vector has to match the number of samples supplied in object. There has to be at least two different batch conditions for the method to work.

nsv

a single integer indicating the number of hidden surrogate variables. This argument is only relevant in case batchGroups is used.

rmPC

a vector of principle components (PCs) which should be removed. PCs are counted staring from 1 up to the maximal count of samples.

outlier

a method for the genewise removal of single outliers; has to be one of c("esd", "zscore")

userFun

a function or its name that should be used to calculate the correlation between coding and non-coding genes. This argument has to be used in combination with corMethod = "user"

verbose

whether to give messages about the progression of the function TRUE or not FALSE

Details

object can be a matrix, a data.frame or an ExpressionSet with rows corresponding to genes and columns to samples, the assumed co-expression conditions. Genes with duplicated names, genes having 0 variance plus genes with to many missing or infinite values will be removed from the input. For inputs showing a high inter-sample variance (ANOVA) in combination with many single outliers a warning message will appear. By default Spearman's rank correlation will be computed between protein-coding to non-coding genes. For this method a time-efficient C++ implementation will be called. Longer computation times occur for genes > 5000 and samples > 100. Missing values are handled in a manner that only pairwise complete observations will be compared. A customized correlation function can be applied supplying the function in userFun and requires the formal arguments x and x. This has priority over corMethod.

A number of statistical methods are available in order to remove effects from a given input expression matrix which depend on the used platform or technology and may hide relevant biology. The argument batchGroups works as a rapper of the SVA package calling sva::svaseq. The number of hidden surrogate variables is set to nsv = 1 by default; it can be estimated utilizing the function sva::num.sv. For this model to work the description of at least two different batches are required in batchGroups. Principle Component Analysis (PCA) can be performed by rmPC = c(...) where ... represents a vector of principle components. The command rmPC = c(2:ncol(object)) will remove the first PC from the input. This method can be used to determine whether observations are due to the main variance in the dataset i.e. main groups or subtypes. Outliers are handled genewise. The extreme Studentized deviate (ESD) test by Rosner, Bernard (1983) will detect one up to four outliers in a gene and replace them by NA values. The alternative zscore will perform a robust zscore test suggested by Boris Iglewicz and David Hoaglin (1993) and detect a single outlier in a gene if |Z| > 3.5.

A LINCmatrix can be recalculated with the command linc(LINCmatrix, ...)) in order to change further arguments. plotlinc(LINCmatrix, ...)) will plot a figure depicting the statistical analysis and correlation values. As for most objects of the LINC class manipulation of the last slot linCenvir will likely result in unexpected errors.

Value

an object of the class 'LINCmatrix' (S4) with 6 Slots

results

a list containing the original input expression matrix or a transfomed matrix if rmPC, batchGroups or outlier was applied

assignment

a character vector of protein-coding genes

correlation

a list of $cormatrix, the correlation of non-coding to protein-coding genes and $lnctolnc, the correlation of non-coding to non-coding genes

expression

the original expression matrix

history

a storage environment of important methods, objects and parameters used to create the object

linCenvir

a storage environment ensuring the compatibility to other objects of the LINC class

Methods

signature(object = "data.frame", codingGenes = "ANY")

(see details)

signature(object = "ExpressionSet", codingGenes = "ANY")

(see details)

signature(object = "LINCmatrix", codingGenes = "missing")

(see details)

signature(object = "matrix", codingGenes = "ANY")

(see details)

Compatibility

plotlinc(LINCmatrix, ...), clusterlinc(LINCmatrix, ...), singlelinc(LINCmatrix, ...), ...

Author(s)

Manuel Goepferich

References

[1] https://www.bioconductor.org/packages/release/bioc/html/sva.html

[2] Rosner, Bernard (May 1983), Percentage Points for a Generalized ESD Many-Outlier Procedure,Technometrics, 25(2), pp. 165-172.

[3] Boris Iglewicz and David Hoaglin (1993), Volume 16:How to Detect and Handle Outliers", The ASQC Basic References in Quality Control: Statistical Techniques, Edward F. Mykytka, Ph.D., Editor.

See Also

justlinc ; clusterlinc ; singlelinc

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
data(BRAIN_EXPR)

# call 'linc' with no further arguments
crbl_matrix <- linc(cerebellum, codingGenes = pcgenes_crbl)

# remove first seven principle components
crbl_matrix_pc <- linc(cerebellum, codingGenes = pcgenes_crbl, rmPC = c(1:7))

# negative correlation by using 'userFun'
crbl_matrix_ncor <- linc(cerebellum, codingGenes = pcgenes_crbl,
                         userFun = function(x,y){ -cor(x,y) })

# remove outliers using the ESD method
crbl_matrix_esd <- linc(cerebellum, codingGenes = pcgenes_crbl, outlier = "esd")

# plot this object
plotlinc(crbl_matrix_esd)

ManuelGoepferich/LINC_justlinc documentation built on May 7, 2019, 2:47 p.m.