amCluster | R Documentation |
Performs clustering of multilocus genotypes to identify unique consensus and singleton genotypes
and generates analysis output in formatted text, HTML, or CSV. These functions are usually
called by amUnique
. This interface remains to enable a better understanding of how
amUnique
operates. For more information see example.
There are three steps to this analysis: (1) identify the dissimilarity between pairs of genotypes using a metric which takes missing data into account, (2) cluster this dissimilarity matrix using a standard hierarchical agglomerative clustering approach, and (3) use a dynamic tree cutting approach to identify clusters.
amCluster(
amDatasetFocal,
runUntilSingletons = TRUE,
cutHeight = 0.3,
missingMethod = 2,
consensusMethod = 1,
clusterMethod = "complete"
)
amHTML.amCluster(
x,
htmlFile = NULL,
htmlCSS = amCSSForHTML()
)
amCSV.amCluster(
x,
csvFile
)
## S3 method for class 'amCluster'
summary(
object,
html = NULL,
csv = NULL,
...
)
amDatasetFocal |
An |
runUntilSingletons |
When |
cutHeight |
Sets the tree cutting height using the hybrid method in the |
missingMethod |
The method used to determine the similarity of multilocus genotypes when data is missing. |
consensusMethod |
The method (an integer) used to determine the consensus multilocus genotype from a cluster
of multilocus genotypes. |
clusterMethod |
The method used by |
object , x |
An |
htmlFile |
HTML filepath to create. |
htmlCSS |
String containing a valid cascading style sheet. |
html |
If |
csvFile , csv |
CSV filepath to create containing only the unique genotypes determined in the clustering. |
... |
Additional arguments to |
Selecting an appropriate cutHeight
parameter (also known as the d-hat criterion) is
essential. Typically this function is called from amUnique
, and the conversion between
alleleMismatch (m-hat) and cutHeight (d-hat) will be done automatically. Selecting an
appropriate value for alleleMismatch (m-hat) can be done using amUniqueProfile
. See the
supplementary documentation for an explanation of how these parameters are related.
runUntilSingletons=TRUE
provides an efficient and reliable way to determine the unique
individuals in a dataset if the dataset meets certain criteria. To understand how the clustering
is thinning the dataset run this recursion manually using runUntilSingletons=FALSE
. An
example is provided below.
cutHeight
in practice gives the amount of dissimilarity (using the metric described in
amMatrix
) required for two multilocus genotypes to be declared different (also
known as d-hat). The default setting for consensusMethod
performs well.
consensusMethod |
|
1 | Genotype with max similarity to others in the cluster is consensus (DEFAULT) |
2 | Genotype with max similarity to others in the cluster is consensus then interpolate missing alleles using mode non-missing allele in each column |
3 | Genotype with min missing data is consensus |
4 | Genotype with min missing data is consensus then interpolate missing alleles using mode non-missing allele in each column |
amCluster
object or side effects: analysis summary written to an HTML file or to the
console, or written to a CSV file.
There is an additional side effect of html = TRUE
(or of htmlFile = NULL
). If
required, there is a clean up of the operating system temporary directory where AlleleMatch
temporary HTML files are stored. Files that match the pattern am*.html and are older than 24
hours are deleted from this temporary directory.
Paul Galpern (pgalpern@gmail.com)
For a complete vignette, please access via the Data S1 Supplementary documentation and tutorials (PDF) located at <doi:10.1111/j.1755-0998.2012.03137.x>.
amDataset
, amMatrix
, amPairwise
,
amUnique
, amUniqueProfile
## Not run:
data("amExample5")
## Produce amDataset object
myDataset <-
amDataset(
amExample5,
missingCode = "-99",
indexColumn = 1,
metaDataColumn = 2,
ignoreColumn = "gender"
)
## Usage
myCluster <-
amCluster(
myDataset,
cutHeight = 0.2
)
## Display analysis as HTML in default browser
summary.amCluster(
myCluster,
html = TRUE
)
## Save analysis to HTML file
summary.amCluster(
myCluster,
html = "myCluster.htm"
)
## Display analysis as formatted text on the console
summary.amCluster(myCluster)
## Save unique genotypes only to a CSV file
summary.amCluster(
myCluster,
csv = "myCluster.csv"
)
## Demonstration of how amCluster operates
## Manual control over the recursion in amCluster()
summary.amCluster(
myCluster1 <-
amCluster(
myDataset,
runUntilSingletons = FALSE,
cutHeight = 0.2
),
html = TRUE
)
summary.amCluster(
myCluster2 <-
amCluster(
myCluster1$unique,
runUntilSingletons = FALSE,
cutHeight = 0.2
),
html = TRUE
)
summary.amCluster(
myCluster3 <-
amCluster(
myCluster2$unique,
runUntilSingletons = FALSE,
cutHeight = 0.2
),
html = TRUE
)
summary.amCluster(
myCluster4 <-
amCluster(
myCluster3$unique,
runUntilSingletons = FALSE,
cutHeight = 0.2
),
html = TRUE
)
## No more clusters, therefore stop.
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.