snpgdsCutTree: Determine clusters of individuals
In SNPRelate: Parallel Computing Toolset for Relatedness and Principal Component Analysis of SNP Data

Description Usage Arguments Details Value Author(s) See Also Examples

To determine sub groups of individuals using a specified dendrogram from hierarchical cluster analysis

1
2
3

snpgdsCutTree(hc, z.threshold=15, outlier.n=5, n.perm = 5000, samp.group=NULL,
    col.outlier="red", col.list=NULL, pch.outlier=4, pch.list=NULL,
    label.H=FALSE, label.Z=TRUE, verbose=TRUE)

`hc`	an object of `snpgdsHCluster`
`z.threshold`	the threshold of Z score to determine whether split the node or not
`outlier.n`	the cluster with size less than or equal to `outlier.n` is considered as outliers
`n.perm`	the times for permutation
`samp.group`	if `NULL`, determine groups by Z score; if a vector of factor, assign each individual in dendrogram with respect to `samp.group`
`col.outlier`	the color of outlier
`col.list`	the list of colors for different clusters
`pch.outlier`	plotting 'character' for outliers
`pch.list`	plotting 'character' for different clusters
`label.H`	if TRUE, plotting heights in a dendrogram
`label.Z`	if TRUE, plotting Z scores in a dendrogram
`verbose`	if TRUE, show information

The details will be described in future.

Return a list:

`sample.id`	the sample ids used in the analysis
`z.threshold`	the threshold of Z score to determine whether split the node or not
`outlier.n`	the cluster with size less than or equal to `outlier.n` is considered as outliers
`samp.order`	the order of samples in the dendrogram
`samp.group`	a vector of factor, indicating the group of each individual
`dmat`	a matrix of pairwise group dissimilarity
`dendrogram`	the dendrogram of individuals
`merge`	a data.frame of `(z, n1, n2)` describing each combination: `z`, the Z score; `n1`, the size of the first cluster; `n2`, the size of the second cluster
`clust.count`	the counts for clusters

Xiuwen Zheng

snpgdsHCluster, snpgdsDrawTree, snpgdsIBS, snpgdsDiss

# open an example dataset (HapMap)
genofile <- snpgdsOpen(snpgdsExampleFileName())

pop.group <- as.factor(read.gdsn(index.gdsn(
    genofile, "sample.annot/pop.group")))
pop.level <- levels(pop.group)

diss <- snpgdsDiss(genofile)
hc <- snpgdsHCluster(diss)

# close the genotype file
snpgdsClose(genofile)



###################################################################
# cluster individuals
#

set.seed(100)
rv <- snpgdsCutTree(hc, label.H=TRUE, label.Z=TRUE)

# the distribution of Z scores
snpgdsDrawTree(rv, type="z-score", main="HapMap Phase II")

# draw dendrogram
snpgdsDrawTree(rv, main="HapMap Phase II",
    edgePar=list(col=rgb(0.5,0.5,0.5, 0.75), t.col="black"))


###################################################################
# or cluster individuals by ethnic information
#

rv2 <- snpgdsCutTree(hc, samp.group=pop.group)

# cluster individuals by Z score, specifying 'clust.count'
snpgdsDrawTree(rv2, rv$clust.count, main="HapMap Phase II",
    edgePar = list(col=rgb(0.5,0.5,0.5, 0.75), t.col="black"),
    labels = c("YRI", "CHB/JPT", "CEU"), y.label=0.1)
legend("bottomleft", legend=levels(pop.group), col=1:nlevels(pop.group),
    pch=19, ncol=4, bg="white")



###################################################################
# zoom in ...
#

snpgdsDrawTree(rv2, rv$clust.count, dend.idx = c(1),
    main="HapMap Phase II -- YRI",
    edgePar=list(col=rgb(0.5,0.5,0.5, 0.75), t.col="black"),
    y.label.kinship=TRUE)

snpgdsDrawTree(rv2, rv$clust.count, dend.idx = c(2,2),
    main="HapMap Phase II -- CEU",
    edgePar=list(col=rgb(0.5,0.5,0.5, 0.75), t.col="black"),
    y.label.kinship=TRUE)

snpgdsDrawTree(rv2, rv$clust.count, dend.idx = c(2,1),
    main="HapMap Phase II -- CHB/JPT",
    edgePar=list(col=rgb(0.5,0.5,0.5, 0.75), t.col="black"),
    y.label.kinship=TRUE)

Loading required package: gdsfmt
SNPRelate -- supported by Streaming SIMD Extensions 2 (SSE2)
Individual dissimilarity analysis on genotypes:
Excluding 365 SNPs on non-autosomes
Excluding 1 SNP (monomorphic: TRUE, MAF: NaN, missing rate: NaN)
    # of samples: 279
    # of SNPs: 8,722
    using 1 thread
Dissimilarity:    the sum of all selected genotypes (0,1,2) = 2446510
Dissimilarity:	Wed Dec 23 15:59:42 2020	0%
Dissimilarity:	Wed Dec 23 15:59:43 2020	100%
Determine groups by permutation (Z threshold: 15, outlier threshold: 5):
Create 3 groups.
Create 4 groups.