runHeuristics: Select best fit of the hierarchical Bayesian cluster model

Description Usage Arguments Details Value Examples

View source: R/runHeuristics.R

Description

Use heuristics to select the best-fitting hierarchical Bayesian cluster model, from multiple fitted models. Certain data processing is performed using the best fitting model, to generate a cluster assignment variable.

Usage

1
runHeuristics(fittedModel, minHomozygoteSize)

Arguments

fittedModel

An object of class hierarchicalBayesianModel, containing data about one or more fitted models.

minHomozygoteSize

The minimum allowable size of the homzygote clusters.

Details

This function takes as input multiple fitted Hierarchical Bayesian Cluster models. These models are fitted using MCMC, often using different initial states. This function produces a summary of the fitted models, including cluster assignment variables and component covariance matrices. It also chooses a ‘best’ fitted model.

Selecting the ‘best’ fitted model is complicated. We begin by rejecting models for which any of the following are true:

1.

For more than 5% of the data points, the posterior probability of membership is more than 15%, for BOTH homozygote clusters

2.

Either of the homozygote clusters is smaller than minHomozygoteSize

3.

The difference in the x-coordinate of the two homozygote clusters is smaller than 0.06

4.

More than a quater of the points fall into the outlier cluster

5.

The difference in the centers of the homozygote clusters is small, relative to the covariance matrix of those clusters

If there are any remaining models, then for each model we compute the maximum of the determinants of the covariance matrices of the homozygote clusters. We select the model for this value (the maximum determinant) is smallest.

If there are no remaining models, then the returned object indicates that no suitable model was found.

Value

An object of class magicHeuristicHBC, containing data about all the fitted models, and an indication of which model is best, according to the heuristics described in this section.

classifications

A list with an entry for each Markov chain. Each entry is a vector of cluster assignments for the data points.

clusterMeans

A list with an entry for each Markov chain. Each entry is a matrix with three rows and two columns. Each row gives the center point of a cluster. The first two rows are the centers of the homozygote clusters, and the third row is the center of the heterozygote cluster.

covariances

A list with an entry for each Markov chain. Each entry is a 4 x 2 x 2 matrix, so that m[i, , ] is a 2 x 2 matrix representing the covariance matrix of a component. The first two such matrices represent the covariance matrices of the homozygote clusters. The third represents the covariance matrix of the homozygote cluster. The fourth represents the covariance matrix of the error component.

chainIndex

The index of the ‘best’ Markov chain. If no chain satisfied the required conditions, then this is set to -1.

data

The input data.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
data("eightWayExampleData", package="magicCalling")
data <- eightWayExampleData[[1]]
meanY <- mean(data[,2])
startingPoints <- list(
 rbind(c(0.5, meanY), c(0.5, meanY)),
     rbind(c(0.5, meanY), c(0.5, meanY)),
     rbind(c(0.25, meanY), c(0.5, meanY)),
     rbind(c(0.25, meanY), c(0.5, meanY)),
     rbind(c(0.75, meanY), c(0.5, meanY)),
     rbind(c(0.75, meanY), c(0.5, meanY)),
     rbind(c(0.8, meanY), c(0.2, meanY)),
     rbind(c(0.8, meanY), c(0.2, meanY))
)
result <- fitClusterModel(data, startingPoints, n.iter = 200, D_hom = diag(2)*4, V_hom = cbind(c(0.005, 0), c(0, 0.1))/3, n_hom = 30, D_err = diag(2), V_err = diag(2)*10/3, n_err = 300, V_het = diag(2)*0.025/3, n_het = 1500)
plot(result, chainIndex = 1)
heuristicResults <- runHeuristics(result, minHomozygoteSize = 200)
plot(heuristicResults, chainIndex = heuristicResults$chainIndex)

rohan-shah/magicCalling documentation built on Jan. 3, 2020, 6:28 p.m.