gowdis | R Documentation |
gowdis
measures the Gower (1971) dissimilarity for mixed variables, including asymmetric binary variables. Variable weights can be specified. gowdis
implements Podani's (1999) extension to ordinal variables.
gowdis(x, w, asym.bin = NULL, ord = c("podani", "metric", "classic"))
x |
matrix or data frame containing the variables. Variables can be |
w |
vector listing the weights for the variables in |
asym.bin |
vector listing the asymmetric binary variables in |
ord |
character string specifying the method to be used for ordinal variables (i.e. |
gowdis
computes the Gower (1971) similarity coefficient exactly as described by Podani (1999), then converts it to a dissimilarity coefficient by using D = 1 - S
. It integrates variable weights as described by Legendre and Legendre (1998).
Let \mathbf{X} = \{x_{ij}\}
be a matrix containing n
objects (rows) and m
columns (variables). The similarity G_{jk}
between objects j
and k
is computed as
G_{jk} = \frac{\sum_{i=1}^{n} w_{ijk} s_{ijk}}{\sum_{i=1}^{n} w_{ijk}}
,
where w_{ijk}
is the weight of variable i
for the j
-k
pair, and s_{ijk}
is the partial similarity of variable i
for the j
-k
pair,
and where w_{ijk} = 0
if objects j
and k
cannot be compared because x_{ij}
or x_{ik}
is unknown (i.e. NA
).
For binary variables, s_{ijk} = 0
if x_{ij} \neq x_{ik}
, and s_{ijk} = 1
if x_{ij} = x_{ik} = 1
or if x_{ij} = x_{ik} = 0
.
For asymmetric binary variables, same as above except that w_{ijk} = 0
if x_{ij} = x_{ik} = 0
.
For nominal variables, s_{ijk} = 0
if x_{ij} \neq x_{ik}
and s_{ijk} = 1
if x_{ij} = x_{ik}
.
For continuous variables,
s_{ijk} = 1 - \frac{|x_{ij} - x_{ik}|} {x_{i.max} - x_{i.min}}
where x_{i.max}
and x_{i.min}
are the maximum and minimum values of variable i
, respectively.
For ordinal variables, when ord = "podani"
or ord = "metric"
, all x_{ij}
are replaced by their ranks r_{ij}
determined over all objects (such that ties are also considered), and then
if ord = "podani"
s_{ijk} = 1
if r_{ij} = r_{ik}
, otherwise
s_{ijk} = 1 - \frac{|r_{ij} - r_{ik}| - (T_{ij} - 1)/2 - (T_{ik} - 1)/2 }{r_{i.max} - r_{i.min} - (T_{i.max} - 1)/2 - (T_{i.min}-1)/2 }
where T_{ij}
is the number of objects which have the same rank score for variable i
as object j
(including j
itself), T_{ik}
is the number of objects which have the same rank score for variable i
as object k
(including k
itself), r_{i.max}
and r_{i.min}
are the maximum and minimum ranks for variable i
, respectively, T_{i,max}
is the number of objects with the maximum rank, and T_{i.min}
is the number of objects with the minimum rank.
if ord = "metric"
s_{ijk} = 1 - \frac{|r_{ij} - r_{ik}|}{r_{i.max} - r_{i.min}}
When ord = "classic"
, ordinal variables are simply treated as continuous variables.
an object of class dist
with the following attributes: Labels
, Types
(the variable types, where 'C' is continuous/numeric, 'O' is ordinal, 'B' is symmetric binary, 'A' is asymmetric binary, and 'N' is nominal), Size
, Metric
.
Etienne Laliberté etiennelaliberte@gmail.com https://www.elaliberte.info/, with some help from Philippe Casgrain for the C interface.
Gower, J. C. (1971) A general coefficient of similarity and some of its properties. Biometrics 27:857-871.
Legendre, P. and L. Legendre (1998) Numerical Ecology. 2nd English edition. Amsterdam: Elsevier.
Podani, J. (1999) Extending Gower's general coefficient of similarity to ordinal characters. Taxon 48:331-340.
daisy
is similar but less flexible, since it does not include variable weights and does not treat ordinal variables as described by Podani (1999). Using ord = "classic"
reproduces the behaviour of daisy
.
ex1 <- gowdis(dummy$trait)
ex1
# check attributes
attributes(ex1)
# to include weights
w <- c(4,3,5,1,2,8,3,6)
ex2 <- gowdis(dummy$trait, w)
ex2
# variable 7 as asymmetric binary
ex3 <- gowdis(dummy$trait, asym.bin = 7)
ex3
# example with trait data from New Zealand vascular plant species
ex4 <- gowdis(tussock$trait)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.