glottodist | R Documentation |
Calculate distances between languages
glottodist(glottodata, metric = "gower")
glottodata |
glottodata or glottosubdata, either with or without structure table. |
metric |
either "gower" or "anderberg" |
object of class dist
The function “glottodist” returns a “dist” object with respect to either Gower distance or Anderberg dissimilarity.
The Anderberg dissimilarity is defined as follows.
Consider a categorical dataset L
containing N
objects X_1, \cdots, X_N
defined over a set of d
categorical features where A_k
denotes the k-
th feature.
The feature A_k
take n_k
values in the given dataset which are denoted by \mathcal{A}_k
. We regard 'NA' as a new value.
We also use the following notations:
f_k(x)
: The number of times feature A_k
takes the value x
in the dataset L
.
If x\notin\mathcal{A}_k
, f_k(x)=0
.
\hat{p}_k(x)
: The sample frequency of feature A_k
to take the value x
in the dataset L
. \hat{p}_k(x)=\frac{f_k(x)}{N}
.
The Anderberg dissimilarity of X
and Y
is defined in the form of:
d(X_i, X_j)=
\frac{D}{D+S},
where
D = \sum\limits_{k\in \{1\leq k \leq d: X_k \neq Y_k\}} w_k * \delta^{(k)}_{ij} *
\tau_{ij}^{(k)}\left(\frac{1}{2\hat{p}_k(X_k)\hat{p}_k(Y_k)}\right)\frac{2}{n_k(n_k+1)},
and
S = \sum\limits_{k\in \{1\leq k \leq d: X_k = Y_k\}} w_k * \delta^{(k)}_{ij}\left(\frac{1}{\hat{p}_k(X_k)}\right)^2\frac{2}{n_k(n_k+1)}
The numeber w_k
gives the weight of the k
-th feature,
and the numebr \delta^{(k)}_{ij}
is equal to either 0
or 1
.
It is equal to 0
when the type of the k
-th feature is asymmetric binary and both values of X_i
and X_j
are 0
,
or when either value of the k
-th feature is missing,
otherwise, it is equal to 1
.
When X_k \neq Y_k
and the type of A_k
is "ordered",
\tau_{ij}^{(k)}
is equal to the normalized difference of X_k
and Y_k
,
otherwise \tau_{ij}^{(k)}
is equal to 1
.
Andergerg M.R. (1973). Cluster analysis for applications. Academic Press, New York.
Boriah S., Chandola V., Kumar V. (2008). Similarity measures for categorical data: A comparative evaluation.
In: Proceedings of the 8th SIAM International Conference on Data Mining, SIAM, p. 243-254.
glottodata <- glottoget("demodata", meta = TRUE)
glottodist <- glottodist(glottodata = glottodata, metric="anderberg")
glottosubdata <- glottoget("demosubdata", meta = TRUE)
glottodist <- glottodist(glottodata = glottosubdata)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.