fitClusterModel: Perform model-based clustering
In rohan-shah/magicCalling: Genotype calling for multi-parent RILs

Description Usage Arguments Details Examples

View source: R/fitClusterModel.R

Perform model-based clustering, using a normal mixture model that includes heterozygotes.

1 2	fitClusterModel(data, startingPoints, n.iter, D_hom, V_hom, n_hom, D_err, V_err, n_err, V_het, n_het)

`data`	The input data, as a two-column matrix
`startingPoints`	A list of starting points for the clustering algorithm. Each starting point is a 2 x 2 matrix, containing two column vectors. These column vectors are the initial centers for the two homozygote clusters.
`n.iter`	The number of MCMC iterations to use for the clustering algorithm.
`D_hom`	2 x 2 covariance matrix controlling the distribution of the mean of each homozygote cluster. It is sensible to use a multiple of the identity matrix.
`V_hom`	2 x 2 covariance matrix controlling the distribution of the homozygous alleles around the cluster mean. The value of this parameter depends on the shapes of the clusters corresponding to homzygous alleles. The value given in `magicCalling:::exampleModelParameters` corresponds to clusters that are axis-aligned, with the clusters being larger alogn the y-axis than the x-axis.
`n_hom`	Parameter specifying the degree of certainty about the shapes of the homozygote clusters. If this parameter is very large, than the shape of the homozygous clusters is completely determined by `V_hom`. If it is small, then the shapes of the clusters will be determined more by the data than by `V_hom`.
`D_err`	2 x 2 covariance matrix controlling the distribution of the mean of the error cluster. It is sensible to use a multiple of the identity matrix.
`V_err`	2 x 2 covariance matrix controlling the distribution of the errors around the cluster mean. It is sensible to use a (large) multiple of the identity matrix, indicating a high detgree of uncertainty.
`n_err`	Parameter specifying the degree of certainty about the shape of the error cluster. This parameter should be small, indicating a high degree of uncertainty.
`V_het`	2 x 2 covariance matrix controlling the distribution of the heterozygotes around the mean of the cluster. It is sensible to use a small multiple of the identity matrix, so that this cluster tends to be small.
`n_het`	Parameter specifying the degree of certainty about the shape of the heterozygote cluster. This parameter should be large, indicating a high degree of certainty that that the heterozygote cluster is small.

This function applies a model-based clustering method. The data is assumed to be generated by a four-component normal distribution, consisting of two clusters of homozygous alleles, a cluster of heterozygotes lying exactly between the two homozygotes, and a cluster containing errors. This model is fit using the JAGS software package.

data("eightWayExampleData", package="magicCalling")
data <- eightWayExampleData[[1]]
meanY <- mean(data[,2])
startingPoints <- list(
	rbind(c(0.5, meanY), c(0.5, meanY)),
     rbind(c(0.5, meanY), c(0.5, meanY)),
     rbind(c(0.25, meanY), c(0.5, meanY)),
     rbind(c(0.25, meanY), c(0.5, meanY)),
     rbind(c(0.75, meanY), c(0.5, meanY)),
     rbind(c(0.75, meanY), c(0.5, meanY)),
     rbind(c(0.8, meanY), c(0.2, meanY)),
     rbind(c(0.8, meanY), c(0.2, meanY))
)
result <- fitClusterModel(data, startingPoints, n.iter = 200, D_hom = diag(2)*4, V_hom = cbind(c(0.005, 0), c(0, 0.1))/3, n_hom = 30, D_err = diag(2), V_err = diag(2)*10/3, n_err = 300, V_het = diag(2)*0.025/3, n_het = 1500)
plot(result, chainIndex = 1)
plot(result, chainIndex = 2)