Hierarchical Clustering of SNP Data

Share:

Description

Clusters SNPs hierachically.

Usage

1
cluster.snp(x = NULL, d = NULL, method = "average", SNP_index = NULL)

Arguments

x

The SNP data matrix of size nobs x nvar. Default value is NULL

d

NULL or a dissimilarity matrix. See the 'Details' section.

method

The agglomeration method to be used. This should be (an unambiguous abbreviation of) one of "ward.D", "ward.D2", "single", "complete", "average" (= UPGMA), "mcquitty" (= WPGMA), "median" (= WPGMC) or "centroid" (= UPGMC). See hclust for details.

SNP_index

NULL or the index vector of SNPs to be clustered. See the 'Details' section.

Details

The SNPs are clustered using hclust, which performs a hierarchical cluster analysis using a set of dissimilarities for the nvar objects being clustered. There are 3 possible scenarios.

If d = NULL, x is used to compute the dissimilarity matrix. The dissimilarity measure between two SNPs is 1 - LD (Linkage Disequilibrium), where LD is defined as the square of the Pearson correlation coefficient. If SNP_index = NULL, all nvar SNPs will be clustered; otherwise only the SNPs with indices specified by SNP_index will be considered.

If the user wishes to use a different dissimilarity measure, d needs to be provided. d must be either a square matrix of size nvar x nvar, or an object of class dist. If d is provided, x and SNP_index will be ignored.

Value

An object of class dendrogram which describes the tree produced by the clustering algorithm hclust.

Examples

1
2
3
4
5
6
7
library(MASS)
x <- mvrnorm(60,mu = rep(0,60), Sigma = diag(60))
clust.1 <- cluster.snp(x = x, method = "average")
SNP_index <- seq(1,10)
clust.2 <- cluster.snp(x = x, method = "average", SNP_index = SNP_index)
d <- dist(x)
clust.3 <- cluster.snp(d = d, method = "single")