amCluster: Clustering of multilocus genotypes

View source: R/allelematch.r

amClusterR Documentation

Clustering of multilocus genotypes

Description

Performs clustering of multilocus genotypes to identify unique consensus and singleton genotypes and generates analysis output in formatted text, HTML, or CSV. These functions are usually called by amUnique. This interface remains to enable a better understanding of how amUnique operates. For more information see example.

There are three steps to this analysis: (1) identify the dissimilarity between pairs of genotypes using a metric which takes missing data into account, (2) cluster this dissimilarity matrix using a standard hierarchical agglomerative clustering approach, and (3) use a dynamic tree cutting approach to identify clusters.

Usage

	amCluster(
		amDatasetFocal, 
		runUntilSingletons = TRUE, 
		cutHeight = 0.3, 
		missingMethod = 2, 
		consensusMethod = 1, 
		clusterMethod = "complete"
		)

	amHTML.amCluster(
		x, 
		htmlFile = NULL, 
		htmlCSS = amCSSForHTML()
		)

	amCSV.amCluster(
		x, 
		csvFile
		)

	## S3 method for class 'amCluster'
summary(object, html = NULL, csv = NULL, ...)

Arguments

amDatasetFocal

An amDataset object containing genotypes to cluster.

runUntilSingletons

When runUntilSingletons = TRUE, the analysis runs recursively with the unique individuals determined in one analysis feeding into the next until no more clusters are formed; applicable when the goal is to thin a dataset to unique genotypes.
For more manual control over the process, use runUntilSingletons = FALSE.
See details and examples.

cutHeight

Sets the tree cutting height using the hybrid method in the dynamicTreeCut package.
See details and cutreeHybrid for more information.

missingMethod

The method used to determine the similarity of multilocus genotypes when data is missing.
The default, (missingMethod = 2), is preferable in all cases.
See amMatrix.

consensusMethod

The method (an integer) used to determine the consensus multilocus genotype from a cluster of multilocus genotypes.
See details.

clusterMethod

The method used by hclust for clustering.
Only the default clusterMethod = "complete" performs acceptably in simulations.
This option remains for experimental reasons.

object, x

An amPairwise object.

htmlFile

HTML filepath to create.
If htmlFile = NULL, a file is created in the operating system temporary directory and is then opened in the default browser.

htmlCSS

String containing a valid cascading style sheet.
A default style sheet is provided in amCSSForHTML.
See amCSSForHTML for details of how to tweak this CSS.

html

If html = NULL or html=FALSE, formatted textual output is displayed on the console.
If html = TRUE, the summary method produces and loads an HTML file in the default browser.
html can also contain a path to a file where HTML output will be written.

csvFile, csv

CSV filepath to create containing only the unique genotypes determined in the clustering.

...

Additional arguments to summary

Details

Selecting an appropriate cutHeight parameter (also known as the d-hat criterion) is essential. Typically this function is called from amUnique, and the conversion between alleleMismatch (m-hat) and cutHeight (d-hat) will be done automatically. Selecting an appropriate value for alleleMismatch (m-hat) can be done using amUniqueProfile. See the supplementary documentation for an explanation of how these parameters are related.

runUntilSingletons=TRUE provides an efficient and reliable way to determine the unique individuals in a dataset if the dataset meets certain criteria. To understand how the clustering is thinning the dataset run this recursion manually using runUntilSingletons=FALSE. An example is provided below.

cutHeight in practice gives the amount of dissimilarity (using the metric described in amMatrix) required for two multilocus genotypes to be declared different (also known as d-hat). The default setting for consensusMethod performs well.

consensusMethod
1 Genotype with max similarity to others in the cluster is consensus (DEFAULT)
2 Genotype with max similarity to others in the cluster is consensus then interpolate missing alleles using mode non-missing allele in each column
3 Genotype with min missing data is consensus
4 Genotype with min missing data is consensus then interpolate missing alleles using mode non-missing allele in each column

Value

amCluster object or side effects: analysis summary written to an HTML file or to the console, or written to a CSV file.

Note

There is an additional side effect of html = TRUE (or of htmlFile = NULL). If required, there is a clean up of the operating system temporary directory where AlleleMatch temporary HTML files are stored. Files that match the pattern am*.html and are older than 24 hours are deleted from this temporary directory.

Author(s)

Paul Galpern (pgalpern@gmail.com)

References

For a complete vignette, please access via the Data S1 Supplementary documentation and tutorials (PDF) located at <doi:10.1111/j.1755-0998.2012.03137.x>.

See Also

amDataset, amMatrix, amPairwise, amUnique, amUniqueProfile

Examples

	## Not run: 
	data("amExample5")

	## Produce amDataset object
	myDataset <- 
		amDataset(
			amExample5, 
			missingCode = "-99", 
			indexColumn = 1, 
			metaDataColumn = 2, 
			ignoreColumn = "gender"
			)

	## Usage
	myCluster <- 
		amCluster(
			myDataset, 
			cutHeight = 0.2
			)

	## Display analysis as HTML in default browser
	summary(
		myCluster, 
		html = TRUE
		)

	## Save analysis to HTML file
	summary(
		myCluster, 
		html = "myCluster.htm"
		)

	## Display analysis as formatted text on the console
	summary(myCluster)

	## Save unique genotypes only to a CSV file
	summary(
		myCluster, 
		csv = "myCluster.csv"
		)

	## Demonstration of how amCluster operates
	## Manual control over the recursion in amCluster()
	summary(
		myCluster1 <- 
			amCluster(
				myDataset, 
				runUntilSingletons = FALSE, 
				cutHeight = 0.2
				), 
			html = TRUE
			)
	summary(
		myCluster2 <- 
			amCluster(
				myCluster1$unique, 
				runUntilSingletons = FALSE, 
				cutHeight = 0.2
				),
			html = TRUE
			)
	summary(
		myCluster3 <- 
			amCluster(
				myCluster2$unique, 
				runUntilSingletons = FALSE, 
				cutHeight = 0.2
				), 
			html = TRUE
			)
	summary(
		myCluster4 <- 
			amCluster(
				myCluster3$unique, 
				runUntilSingletons = FALSE, 
				cutHeight = 0.2
				), 
			html = TRUE
			)
	## No more clusters, therefore stop.
	
## End(Not run)

allelematch documentation built on Aug. 24, 2023, 5:06 p.m.