| gowdis | R Documentation |
gowdis measures the Gower (1971) dissimilarity for mixed variables, including asymmetric binary variables. Variable weights can be specified. gowdis implements Podani's (1999) extension to ordinal variables.
gowdis(x, w, asym.bin = NULL, ord = c("podani", "metric", "classic"))
x |
matrix or data frame containing the variables. Variables can be |
w |
vector listing the weights for the variables in |
asym.bin |
vector listing the asymmetric binary variables in |
ord |
character string specifying the method to be used for ordinal variables (i.e. |
gowdis computes the Gower (1971) similarity coefficient exactly as described by Podani (1999), then converts it to a dissimilarity coefficient by using D = 1 - S. It integrates variable weights as described by Legendre and Legendre (1998).
Let \mathbf{X} = \{x_{ij}\} be a matrix containing n objects (rows) and m columns (variables). The similarity G_{jk} between objects j and k is computed as
G_{jk} = \frac{\sum_{i=1}^{n} w_{ijk} s_{ijk}}{\sum_{i=1}^{n} w_{ijk}}
,
where w_{ijk} is the weight of variable i for the j-k pair, and s_{ijk} is the partial similarity of variable i for the j-k pair,
and where w_{ijk} = 0 if objects j and k cannot be compared because x_{ij} or x_{ik} is unknown (i.e. NA).
For binary variables, s_{ijk} = 0 if x_{ij} \neq x_{ik}, and s_{ijk} = 1 if x_{ij} = x_{ik} = 1 or if x_{ij} = x_{ik} = 0.
For asymmetric binary variables, same as above except that w_{ijk} = 0 if x_{ij} = x_{ik} = 0.
For nominal variables, s_{ijk} = 0 if x_{ij} \neq x_{ik} and s_{ijk} = 1 if x_{ij} = x_{ik}.
For continuous variables,
s_{ijk} = 1 - \frac{|x_{ij} - x_{ik}|} {x_{i.max} - x_{i.min}}
where x_{i.max} and x_{i.min} are the maximum and minimum values of variable i, respectively.
For ordinal variables, when ord = "podani" or ord = "metric", all x_{ij} are replaced by their ranks r_{ij} determined over all objects (such that ties are also considered), and then
if ord = "podani"
s_{ijk} = 1 if r_{ij} = r_{ik}, otherwise
s_{ijk} = 1 - \frac{|r_{ij} - r_{ik}| - (T_{ij} - 1)/2 - (T_{ik} - 1)/2 }{r_{i.max} - r_{i.min} - (T_{i.max} - 1)/2 - (T_{i.min}-1)/2 }
where T_{ij} is the number of objects which have the same rank score for variable i as object j (including j itself), T_{ik} is the number of objects which have the same rank score for variable i as object k (including k itself), r_{i.max} and r_{i.min} are the maximum and minimum ranks for variable i, respectively, T_{i,max} is the number of objects with the maximum rank, and T_{i.min} is the number of objects with the minimum rank.
if ord = "metric"
s_{ijk} = 1 - \frac{|r_{ij} - r_{ik}|}{r_{i.max} - r_{i.min}}
When ord = "classic", ordinal variables are simply treated as continuous variables.
an object of class dist with the following attributes: Labels, Types (the variable types, where 'C' is continuous/numeric, 'O' is ordinal, 'B' is symmetric binary, 'A' is asymmetric binary, and 'N' is nominal), Size, Metric.
Etienne Laliberté etiennelaliberte@gmail.com https://www.elaliberte.info/, with some help from Philippe Casgrain for the C interface.
Gower, J. C. (1971) A general coefficient of similarity and some of its properties. Biometrics 27:857-871.
Legendre, P. and L. Legendre (1998) Numerical Ecology. 2nd English edition. Amsterdam: Elsevier.
Podani, J. (1999) Extending Gower's general coefficient of similarity to ordinal characters. Taxon 48:331-340.
daisy is similar but less flexible, since it does not include variable weights and does not treat ordinal variables as described by Podani (1999). Using ord = "classic" reproduces the behaviour of daisy.
ex1 <- gowdis(dummy$trait)
ex1
# check attributes
attributes(ex1)
# to include weights
w <- c(4,3,5,1,2,8,3,6)
ex2 <- gowdis(dummy$trait, w)
ex2
# variable 7 as asymmetric binary
ex3 <- gowdis(dummy$trait, asym.bin = 7)
ex3
# example with trait data from New Zealand vascular plant species
ex4 <- gowdis(tussock$trait)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.