clusterlinc-methods: Cluster Queried ncRNAs Based On Their Interaction Partners

Description Usage Arguments Details Value Methods Compatibility Author(s) References See Also Examples

Description

The function clusterlinc will give an overview of ncRNAs in a dataset. An input LINCmatrix will be converted to a LINCcluster. The following steps are conducted (I) computation of a correlation test, (II) setup of a distance matrix, (III) calculation of a dendrogram and (IV) selection of co-expressed genes for each query. The result is a cluster of ncRNAs and their associated protein-coding genes.

Usage

1
2
3
4
5
6
7
8
clusterlinc(linc,
            distMethod    = "dicedist",
            clustMethod   = "average",
            pvalCutOff    = 0.05,
            padjust       = "none",
            coExprCut     = NULL,
            cddCutOff     = 0.05,
            verbose       = TRUE)

Arguments

linc

an object of the class LINCmatrix

distMethod

a method to compute the distance between ncRNAs; has to be one of c("correlation", "pvalue", "dicedist")

clustMethod

an algorithm to compute the dendrogram, has to be one of c("ward.D", "ward.D2", "single", "complete", "average", "mcquitty", "median", "centroid")

pvalCutOff

a threshold for the selection of co-expressed genes. Only protein-coding genes showing a significance in the correlation test lower than pvalCutOff will assigned to queried ncRNAs as interaction partner.

padjust

one of c("holm", "hochberg", "hommel", "bonferroni", "BH", "BY", "fdr", "none")

coExprCut

a single integer indicating the number of co-expressed genes to select. If this argument is used for each ncRNA the coExprCut = n protein-coding genes with the lowest p-value in the correlation test will be assigned to queries, respectively.

cddCutOff

a threshold that is only relevant for distMethod = "dicedist". In this method cddCutOff defines whether a ncRNA and a protein-coding gene can be considered as interaction partners. This influences the distance matrix and the clustering process.

verbose

whether to give messages about the progression of the function TRUE or not FALSE

Details

As a first step clusterlinc conducts the correlation test (stats::cor.test) using the correlation method and handeling of missing values inherited from the input LINCmatrix. Resulting p-values indicate the statistical robustness of correlations instead of absolute correlation values. Co-expression of ncRNAs to protein-coding genes is assumed if the p-value from the cor.test is lower than the given pvalCutOff. An alternative way to select co-expressed genes is provided by coExprCut. This argument has priority over pvalCutOff and can be used to pick the n genes with the lowest p-value for each ncRNA. In contrast to pvalCutOff, this will result in an equal number of assigned co-expressed genes. The argument padjust can be used for multiple testing correction. In most cases this is not compatible with distMethod = "dicedist".

For the computation of the distance matrix of ncRNA genes three methods can be applied. The first method "correlation" uses 1 - correlation as distance measure. In contrast, "pvalue" considers not the absolute correlation values, but p-values from the correlation test. A third method is termed "dicedist" and takes the Czekanovski dice distance [1] as distance measure. Here, the number of shared interaction partners between ncRNAs determines their relation to each other. The argument cddCutOff is an option to decide which p-values in the correlation matrix can be considered as interaction. A low threshold, for instance, will consider only interactions of ncRNAs and protein-coding genes supported by a p-value lower than the supplied threshold and therfore a robust correlation of these two genes. Based on the distance matrix a cluster of the ncRNAs will be computed by stats::hclust. Argument clustMethod defines which clustering method should be applied.

A LINCcluster can be recalculated with the command clusterlinc(LINCcluster, ...)) in order to change further arguments. plotlinc(LINCcluster, ...)) will plot a figure that shows the cluster of ncRNAs (dendrogram) and the number of co-expressed genes with respect to different thresholds. getbio(LINCcluster, ...)) will derive the biological terms associated with the co-expressed genes. Due to the correlation test longer calculation times can occur. A faster alternative to this function is singlelinc(). User-defined correlation test functions are supported for singlelinc() but not for clusterlinc().

Value

an object of the class 'LINCmatrix' (S4) with 6 Slots

results

a list containing an object of the class "phylo" with the additional entry neighbours, a list of queries and co-expressed genes

assignment

a character vector of protein-coding genes

correlation

a list of cormatrix, the correlation of non-coding to protein-coding genes, lnctolnc, the correlation of non-coding to non-coding genes and cortest, p-values of the correlation test of non-coding to protein-coding genes

expression

the original expression matrix

history

a storage environment of important methods, objects and parameters used to create the object

linCenvir

a storage environment ensuring the compatibility to other objects of the LINC class

Methods

signature(linc = "LINCcluster")

(see details)

signature(linc = "LINCmatrix")

(see details)

Compatibility

plotlinc(LINCcluster, ...), getbio(LINCcluster, ...), ...

Author(s)

Manuel Goepferich

References

[1] Christine Brun, Francois Chevenet, David Martin, Jerome Wojcik, Alain Guenoche and Bernard Jacq" Functional classification of proteins for the prediction of cellular function from a protein-protein interaction network" (2003) Genome Biology, 5:R6.

See Also

linc ; singlelinc

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
data(BRAIN_EXPR)
class(crbl_matrix)

# call 'clusterlinc' with no further arguments
crbl_cluster <- clusterlinc(crbl_matrix)

# apply the distance method "correlation instead of "dicedist"
crbl_cluster_cor <- clusterlinc(crbl_matrix, distMethod = "correlation" )
# do the same as recursive call using the 'LINCcluster' object
# crbl_cluster_cor <- clusterlinc(crbl_cluster, distMethod = "correlation")

# select 25 genes with lowest p-values for each query
crbl_cluster_25 <- clusterlinc(crbl_matrix, coExprCut = 25)

# select onyl those with a p-value < 5e-5
crbl_cluster_5e5 <- clusterlinc(crbl_matrix, pvalCutOff = 5e-5)

# adjust for multiple testing
crbl_cluster_hochberg <- clusterlinc(crbl_matrix, distMethod = "correlation",
                                   padjust = "hochberg", pvalCutOff = 0.05)

# comparing two distance methods
 plotlinc(crbl_cluster)
 plotlinc(crbl_cluster_cor)

ManuelGoepferich/LINC documentation built on May 7, 2019, 2:46 p.m.