Description Usage Arguments Details Value Author(s) Examples
Mixture model fitting with Hardy Weinberg Equilibrium and population stratification to infer haplotype (inversion) alleles.
1 |
roi |
text file of data.frame with Region of Interest information. Four columns are required: chr, LBP, RBP, reg, with chromosome, left break point, right break point and a character "reg" that identifies the inversion |
wh |
which ROI (row of roi) to be considered for computation. |
geno |
genotypes in |
annot |
snp annotation as .map PLINK format, or |
SNPtagg |
set SNPtagg="y" to use tagg SNPs in roi for tagging the haplotype groups |
SNPsel |
vector with snps to be selected for computation |
method |
method=1 performs EM algorithm for three genotypes, method=2, performs a clustering within the the genotypes for an additional third haplotype |
dim |
either 1 or 2 indicating the number fo MDS compoents to be used |
pc |
if population clustering is to be performed, first component of a genome-wide PCA of geno |
ngroups |
maximum number of subpopualtions to be considered |
... |
control arguments for the EM algorithm: |
invClust
computes the biallelic haplotypes in Hardy Weinberg Equilibrium (with the possibility
of clustering by geographical subpopulations) that may underlie an inversion event. It fits a mixture
model with an expectation maximization routine, only controled by convergence criteria. Initial conditions
are general for a wide range of cases. Clustering is performed in 1 or 2 dimensions of mutidimeansional
scaling (argument dim
) and, if geographical subpopulation is considered, the first
component of a genome wide PCA (argument pc
). In this last case, a visualization of the
PCA analysis can inform on the suitable number of groups to be considered (argument ngroup
).
Each subject in the sample is assigned a probability to a given genotype (NN, NI, II) which
can be recovered by x["genotypes"]
, where x
is of class invClust
( e.g.
the result of an invClust
call). Most probable genotypes can be extracted with getGenotypes(x)
.
In a similar way, if subpopulation classification is considered,
probability for group membership is recovered with x["groups"]
.
Plots are also implemented for this class,
plot(x)
will display the clustered data on the fitted distribution, according to
dimensions used. For inclussion of subpopulation classification, selection of the marginals
can be done though a plot argument wh=c("yy","xy")
. wh="like"
plots the likelihood
with respect to the number of itereations only for dim=1
.
A useful quantity is a quality score (getQuality(x)
) that computes the overlap integral of the
cluster components, a value of 1 gives no overlap while 0 refers to complete overlap.
EMestimate |
List with fitted parameters |
datin |
List with data used to fit the model: |
Alejandro Caceres
1 2 3 4 |
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.