hca.test: Tests for annotation differences among sample clusters

Description Usage Arguments Details Value Note Author(s) Examples

View source: R/hca.test.R

Description

The main clusters of a dendrogram are tested for different patient annotations.

Usage

1
2
hca.test(g, o, dendcut = 2, method = "correlation", link = "ward", 
         test = "chisq", workspace = 2e+07)

Arguments

g

the input data in form of a matrix with features as rows and samples as columns.

o

the corresponding sample annotations in the form of a data.frame. Sample annotation as a single vector is allowed and will be transformed to a data.frame. rownames (o) must be identical to colnames (g). o can contain factors and numeric variables. No character variables are allowed. NAs are allowed.

dendcut

the number of clusters to cut the dendrogram tree (using cutree()). default=2.

method

the distance method for the clustering. default="correlation". hcluster from the package amap is used and method must be one of "euclidean", "maximum", "manhattan", "canberra" "binary" "pearson", "correlation", "spearman" or "kendall".

link

the agglomeration principle for the clustering. default="ward". hcluster from the package amap is used and link must be one of "ward", "single", "complete", "average", "mcquitty", "median" or "centroid".

test

the test to use for the annotations that are factors. this can be either "fisher" or "chisq" to use fisher.test() or chisq.test(), respectively. default = "chisq". However fisher.test is preferable as it is an exact test. Note that fisher.test() is computationally expensive and can cause R to crash.

workspace

workspace to use if test="fisher"

Details

The function clusters the samples using amap and then cuts the dendrogram into a specified number of clusters. The obtained sample clusters are tested for differences in sample annotations. fisher.test() or chisq.test() is used if the annotation is a factor, lm(annotation~clusters) is used for numeric annotations. The p-values for the dependence of sample annotation and sample clusters are returned.

Value

a list with components

p.values

a numeric vector containing the p.values for the annotation variable.

classes

the classes of the annotation variables in o.

Note

requires the package amap

Author(s)

Martin Lauss

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
## data as a matrix
set.seed(100)
g<-matrix(nrow=1000,ncol=50,rnorm(1000*50),dimnames=list(paste("Feature",1:1000),
          paste("Sample",1:50)))
g[1:100,26:50]<-g[1:100,26:50]+1 # the first 100 features show
# higher values in the samples 26:50
## patient annotations as a data.frame, annotations should be numbers and factor
# but not characters.
## rownames have to be the same as colnames of the data matrix 
set.seed(200)
o<-data.frame(Factor1=factor(c(rep("A",25),rep("B",25))),
              Factor2=factor(rep(c("A","B"),25)),
              Numeric1=rnorm(50),row.names=colnames(g))

# perform the test for the main 2 clusters
res3<-hca.test(g,o,dendcut=2,test="fisher") 
        # use test="chisq" for large ncol(g) to avoid crash of R
res3$p.values

Example output

Loading required package: impute
Loading required package: amap
Loading required package: gplots

Attaching package: 'gplots'

The following object is masked from 'package:stats':

    lowess

Loading required package: MASS
     Factor1      Factor2     Numeric1 
4.113579e-13 1.000000e+00 2.963197e-01 

swamp documentation built on May 2, 2019, 2:14 p.m.