snpgdsCutTree: Determine clusters of individuals

Description Usage Arguments Details Value Author(s) See Also Examples

View source: R/AllUtilities.R

Description

To determine sub groups of individuals using a specified dendrogram from hierarchical cluster analysis

Usage

1
2
3
snpgdsCutTree(hc, z.threshold=15, outlier.n=5, n.perm = 5000, samp.group=NULL,
    col.outlier="red", col.list=NULL, pch.outlier=4, pch.list=NULL,
    label.H=FALSE, label.Z=TRUE, verbose=TRUE)

Arguments

hc

an object of snpgdsHCluster

z.threshold

the threshold of Z score to determine whether split the node or not

outlier.n

the cluster with size less than or equal to outlier.n is considered as outliers

n.perm

the times for permutation

samp.group

if NULL, determine groups by Z score; if a vector of factor, assign each individual in dendrogram with respect to samp.group

col.outlier

the color of outlier

col.list

the list of colors for different clusters

pch.outlier

plotting 'character' for outliers

pch.list

plotting 'character' for different clusters

label.H

if TRUE, plotting heights in a dendrogram

label.Z

if TRUE, plotting Z scores in a dendrogram

verbose

if TRUE, show information

Details

The details will be described in future.

Value

Return a list:

sample.id

the sample ids used in the analysis

z.threshold

the threshold of Z score to determine whether split the node or not

outlier.n

the cluster with size less than or equal to outlier.n is considered as outliers

samp.order

the order of samples in the dendrogram

samp.group

a vector of factor, indicating the group of each individual

dmat

a matrix of pairwise group dissimilarity

dendrogram

the dendrogram of individuals

merge

a data.frame of (z, n1, n2) describing each combination: z, the Z score; n1, the size of the first cluster; n2, the size of the second cluster

clust.count

the counts for clusters

Author(s)

Xiuwen Zheng

See Also

snpgdsHCluster, snpgdsDrawTree, snpgdsIBS, snpgdsDiss

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
# open an example dataset (HapMap)
genofile <- snpgdsOpen(snpgdsExampleFileName())

pop.group <- as.factor(read.gdsn(index.gdsn(
    genofile, "sample.annot/pop.group")))
pop.level <- levels(pop.group)

diss <- snpgdsDiss(genofile)
hc <- snpgdsHCluster(diss)

# close the genotype file
snpgdsClose(genofile)



###################################################################
# cluster individuals
#

set.seed(100)
rv <- snpgdsCutTree(hc, label.H=TRUE, label.Z=TRUE)

# the distribution of Z scores
snpgdsDrawTree(rv, type="z-score", main="HapMap Phase II")

# draw dendrogram
snpgdsDrawTree(rv, main="HapMap Phase II",
    edgePar=list(col=rgb(0.5,0.5,0.5, 0.75), t.col="black"))


###################################################################
# or cluster individuals by ethnic information
#

rv2 <- snpgdsCutTree(hc, samp.group=pop.group)

# cluster individuals by Z score, specifying 'clust.count'
snpgdsDrawTree(rv2, rv$clust.count, main="HapMap Phase II",
    edgePar = list(col=rgb(0.5,0.5,0.5, 0.75), t.col="black"),
    labels = c("YRI", "CHB/JPT", "CEU"), y.label=0.1)
legend("bottomleft", legend=levels(pop.group), col=1:nlevels(pop.group),
    pch=19, ncol=4, bg="white")



###################################################################
# zoom in ...
#

snpgdsDrawTree(rv2, rv$clust.count, dend.idx = c(1),
    main="HapMap Phase II -- YRI",
    edgePar=list(col=rgb(0.5,0.5,0.5, 0.75), t.col="black"),
    y.label.kinship=TRUE)

snpgdsDrawTree(rv2, rv$clust.count, dend.idx = c(2,2),
    main="HapMap Phase II -- CEU",
    edgePar=list(col=rgb(0.5,0.5,0.5, 0.75), t.col="black"),
    y.label.kinship=TRUE)

snpgdsDrawTree(rv2, rv$clust.count, dend.idx = c(2,1),
    main="HapMap Phase II -- CHB/JPT",
    edgePar=list(col=rgb(0.5,0.5,0.5, 0.75), t.col="black"),
    y.label.kinship=TRUE)

Example output

Loading required package: gdsfmt
SNPRelate -- supported by Streaming SIMD Extensions 2 (SSE2)
Individual dissimilarity analysis on genotypes:
Excluding 365 SNPs on non-autosomes
Excluding 1 SNP (monomorphic: TRUE, MAF: NaN, missing rate: NaN)
    # of samples: 279
    # of SNPs: 8,722
    using 1 thread
Dissimilarity:    the sum of all selected genotypes (0,1,2) = 2446510
Dissimilarity:	Wed Dec 23 15:59:42 2020	0%
Dissimilarity:	Wed Dec 23 15:59:43 2020	100%
Determine groups by permutation (Z threshold: 15, outlier threshold: 5):
Create 3 groups.
Create 4 groups.

SNPRelate documentation built on Nov. 8, 2020, 5:31 p.m.