fitClusterModel: Perform model-based clustering

Description Usage Arguments Details Examples

View source: R/fitClusterModel.R

Description

Perform model-based clustering, using a normal mixture model that includes heterozygotes.

Usage

1
2
fitClusterModel(data, startingPoints, n.iter, D_hom, V_hom, n_hom, D_err, V_err,
  n_err, V_het, n_het)

Arguments

data

The input data, as a two-column matrix

startingPoints

A list of starting points for the clustering algorithm. Each starting point is a 2 x 2 matrix, containing two column vectors. These column vectors are the initial centers for the two homozygote clusters.

n.iter

The number of MCMC iterations to use for the clustering algorithm.

D_hom

2 x 2 covariance matrix controlling the distribution of the *mean* of each homozygote cluster. It is sensible to use a multiple of the identity matrix.

V_hom

2 x 2 covariance matrix controlling the distribution of the homozygous alleles *around the cluster mean*. The value of this parameter depends on the shapes of the clusters corresponding to homzygous alleles. The value given in magicCalling:::exampleModelParameters corresponds to clusters that are axis-aligned, with the clusters being larger alogn the y-axis than the x-axis.

n_hom

Parameter specifying the degree of certainty about the shapes of the homozygote clusters. If this parameter is very large, than the shape of the homozygous clusters is completely determined by V_hom. If it is small, then the shapes of the clusters will be determined more by the data than by V_hom.

D_err

2 x 2 covariance matrix controlling the distribution of the *mean* of the error cluster. It is sensible to use a multiple of the identity matrix.

V_err

2 x 2 covariance matrix controlling the distribution of the errors *around the cluster mean*. It is sensible to use a (large) multiple of the identity matrix, indicating a high detgree of uncertainty.

n_err

Parameter specifying the degree of certainty about the shape of the error cluster. This parameter should be small, indicating a high degree of uncertainty.

V_het

2 x 2 covariance matrix controlling the distribution of the heterozygotes around the *mean* of the cluster. It is sensible to use a small multiple of the identity matrix, so that this cluster tends to be small.

n_het

Parameter specifying the degree of certainty about the shape of the heterozygote cluster. This parameter should be large, indicating a high degree of certainty that that the heterozygote cluster is small.

Details

This function applies a model-based clustering method. The data is assumed to be generated by a four-component normal distribution, consisting of two clusters of homozygous alleles, a cluster of heterozygotes lying exactly between the two homozygotes, and a cluster containing errors. This model is fit using the JAGS software package.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
data("eightWayExampleData", package="magicCalling")
data <- eightWayExampleData[[1]]
meanY <- mean(data[,2])
startingPoints <- list(
	rbind(c(0.5, meanY), c(0.5, meanY)),
     rbind(c(0.5, meanY), c(0.5, meanY)),
     rbind(c(0.25, meanY), c(0.5, meanY)),
     rbind(c(0.25, meanY), c(0.5, meanY)),
     rbind(c(0.75, meanY), c(0.5, meanY)),
     rbind(c(0.75, meanY), c(0.5, meanY)),
     rbind(c(0.8, meanY), c(0.2, meanY)),
     rbind(c(0.8, meanY), c(0.2, meanY))
)
result <- fitClusterModel(data, startingPoints, n.iter = 200, D_hom = diag(2)*4, V_hom = cbind(c(0.005, 0), c(0, 0.1))/3, n_hom = 30, D_err = diag(2), V_err = diag(2)*10/3, n_err = 300, V_het = diag(2)*0.025/3, n_het = 1500)
plot(result, chainIndex = 1)
plot(result, chainIndex = 2)

rohan-shah/magicCalling documentation built on Jan. 3, 2020, 6:28 p.m.