snpgdsCutTree: Determine clusters of individuals
In zhengxwen/SNPRelate: Parallel Computing Toolset for Relatedness and Principal Component Analysis of SNP Data

snpgdsCutTree

R Documentation

Determine clusters of individuals

Description

To determine sub groups of individuals using a specified dendrogram from hierarchical cluster analysis

Usage

snpgdsCutTree(hc, z.threshold=15, outlier.n=5, n.perm = 5000, samp.group=NULL,
    col.outlier="red", col.list=NULL, pch.outlier=4, pch.list=NULL,
    label.H=FALSE, label.Z=TRUE, verbose=TRUE)

Arguments

`hc`	an object of `snpgdsHCluster`
`z.threshold`	the threshold of Z score to determine whether split the node or not
`outlier.n`	the cluster with size less than or equal to `outlier.n` is considered as outliers
`n.perm`	the times for permutation
`samp.group`	if `NULL`, determine groups by Z score; if a vector of factor, assign each individual in dendrogram with respect to `samp.group`
`col.outlier`	the color of outlier
`col.list`	the list of colors for different clusters
`pch.outlier`	plotting 'character' for outliers
`pch.list`	plotting 'character' for different clusters
`label.H`	if TRUE, plotting heights in a dendrogram
`label.Z`	if TRUE, plotting Z scores in a dendrogram
`verbose`	if TRUE, show information

Details

The details will be described in future.

Value

Return a list:

`sample.id`	the sample ids used in the analysis
`z.threshold`	the threshold of Z score to determine whether split the node or not
`outlier.n`	the cluster with size less than or equal to `outlier.n` is considered as outliers
`samp.order`	the order of samples in the dendrogram
`samp.group`	a vector of factor, indicating the group of each individual
`dmat`	a matrix of pairwise group dissimilarity
`dendrogram`	the dendrogram of individuals
`merge`	a data.frame of `(z, n1, n2)` describing each combination: `z`, the Z score; `n1`, the size of the first cluster; `n2`, the size of the second cluster
`clust.count`	the counts for clusters

Author(s)

Xiuwen Zheng

Examples

# open an example dataset (HapMap)
genofile <- snpgdsOpen(snpgdsExampleFileName())

pop.group <- as.factor(read.gdsn(index.gdsn(
    genofile, "sample.annot/pop.group")))
pop.level <- levels(pop.group)

diss <- snpgdsDiss(genofile)
hc <- snpgdsHCluster(diss)

# close the genotype file
snpgdsClose(genofile)



###################################################################
# cluster individuals
#

set.seed(100)
rv <- snpgdsCutTree(hc, label.H=TRUE, label.Z=TRUE)

# the distribution of Z scores
snpgdsDrawTree(rv, type="z-score", main="HapMap Phase II")

# draw dendrogram
snpgdsDrawTree(rv, main="HapMap Phase II",
    edgePar=list(col=rgb(0.5,0.5,0.5, 0.75), t.col="black"))


###################################################################
# or cluster individuals by ethnic information
#

rv2 <- snpgdsCutTree(hc, samp.group=pop.group)

# cluster individuals by Z score, specifying 'clust.count'
snpgdsDrawTree(rv2, rv$clust.count, main="HapMap Phase II",
    edgePar = list(col=rgb(0.5,0.5,0.5, 0.75), t.col="black"),
    labels = c("YRI", "CHB/JPT", "CEU"), y.label=0.1)
legend("bottomleft", legend=levels(pop.group), col=1:nlevels(pop.group),
    pch=19, ncol=4, bg="white")



###################################################################
# zoom in ...
#

snpgdsDrawTree(rv2, rv$clust.count, dend.idx = c(1),
    main="HapMap Phase II -- YRI",
    edgePar=list(col=rgb(0.5,0.5,0.5, 0.75), t.col="black"),
    y.label.kinship=TRUE)

snpgdsDrawTree(rv2, rv$clust.count, dend.idx = c(2,2),
    main="HapMap Phase II -- CEU",
    edgePar=list(col=rgb(0.5,0.5,0.5, 0.75), t.col="black"),
    y.label.kinship=TRUE)

snpgdsDrawTree(rv2, rv$clust.count, dend.idx = c(2,1),
    main="HapMap Phase II -- CHB/JPT",
    edgePar=list(col=rgb(0.5,0.5,0.5, 0.75), t.col="black"),
    y.label.kinship=TRUE)

zhengxwen/SNPRelate documentation built on Nov. 19, 2024, 1:02 p.m.