View source: R/ExternalBinaryLogisticBiplot.R
ExternalBinaryLogisticBiplot | R Documentation |
Fits an External Logistic Biplot to the results of a Principal Coordinates Analysis obtained from binary data.
ExternalBinaryLogisticBiplot(Pco, IncludeConst=TRUE, penalization=0.2, freq=NULL,
tolerance = 1e-05, maxiter = 100)
Pco |
An object of class "Principal.Coordinates" |
IncludeConst |
Should the logistic fit include the constant term? |
penalization |
Penalization for the ridge regression |
freq |
frequencies for each observation or pattern (usually 1) |
tolerance |
Tolerance for convergence |
maxiter |
Maximum number of iterations |
Let {\bf{X}}
be the matrix of binary data scored as present or absent (1 or 0), in which the rows correspond to n individuals or entries (for example, genotypes) and the columns to p binary characters (for example alleles or bands), let {\bf{S}} = ({s_{ij}})
be a matrix containing the similarities among rows, obtained from the binary data matrix , and let \Delta = ({\delta _{ij}})
be the corresponding dissimilarity/distance matrix, taking for example {\delta _{ij}} = \sqrt {1 - {s_{ij}}}
.
Despite the fact that, in Cluster Analysis and Principal Coordinates Analysis, interpretation of the variables responsible for grouping or ordination is not straightforward, those methods are normally used to classify individual in which binary variables have been measured.
we use a combination of Principal Coordinates Analysis (PCoA), Cluster Analysis (CA) and External Logistic Regression (ELB), as a better way to interpret the binary variables associated to the classification of genotypes. The combination of three standard techniques with some new ideas about the geometry of the procedures, allows to construct a External Logistic Regression (ELB), that helps the interpretation of the variables responsible for the classification or ordination.
Suppose we have obtained an euclidean configuration {\bf{Y}}
obtained from the Principal Coordinates (PCoA) of the similarity matrix.
To search for the variables associated to the ordination obtained in PCoA, we can look for the directions in the ordination diagram that better predict the probability of presence of each allele.
More formally, if we defined {\pi _{ij}} = E({x_{ij}})= {\textstyle{1 \over {1 + \exp ( - ({b_{j0}} + \sum\limits_{s = 1}^k {{b_{js}}{y_{is}}} ))}}}
as the expected probability that the allele j be present at genotype for a genotype with coordinates y_{is}
(i=1, ...,n; s=1, ..., k) on the ordination diagram, as
where bjs ( j=1,..., p) are the logistic regression coefficients that correspond to the jth variable (alleles or bands) in the sth dimension. The model is a generalized linear model having the logit as a link function.
where and , y's and b's define a biplot in logit scale. This is called External Logistic Biplot because the coordinates of the genotypes are calculated in an external procedure (PCoA). Given that the y's are known from PCoA, obtaining the b´s is equivalent to performing a logistic regression using the j-th column of X as a response variable and the columns of y as regressors.
An object of class External.Binary.Logistic.Biplot
with the fields of the Principal.Coordinates
object with the following fields added.
ColumnParameters |
Parameters resulting from fitting a logistic regression to each column of the original binary data matrix |
VarInfo |
Information of the fit for each variable |
VarInfo$Deviances |
A vector with the deviances of each variable calculated as the difference with the null model |
VarInfo$Dfs |
A vector with degrees of freedom for each variable |
VarInfo$pvalues |
A vector with the p values each variable |
VarInfo$Nagelkerke |
A vector with the Nagelkerke pseudo R-squared for each variable |
VarInfo$PercentsCorrec |
A vector with the percentage of correct classifications for each variable |
DevianceTotal |
Total Deviance as the difference with the null model |
p |
p value for the complete representation |
TotalPercent |
Total percentage of correct classification |
Jose Luis Vicente Villardon
Demey, J., Vicente-Villardon, J. L., Galindo, M.P. AND Zambrano, A. (2008) Identifying Molecular Markers Associated With Classification Of Genotypes Using External Logistic Biplots. Bioinformatics, 24(24): 2832-2838.
Vicente-Villardon, J. L., Galindo, M. P. and Blazquez, A. (2006) Logistic Biplots. In Multiple Correspondence Análisis And Related Methods. Grenacre, M & Blasius, J, Eds, Chapman and Hall, Boca Raton.
data(spiders)
x2=Dataframe2BinaryMatrix(spiders)
colnames(x2)=colnames(spiders)
dist=BinaryProximities(x2)
pco=PrincipalCoordinates(dist)
pcobip=ExternalBinaryLogisticBiplot(pco)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.