diaMeasure: Compute Dialectometrical measure

Description Usage Arguments Value Examples

Description

diaMeasure computes the requested dialectometrical measure between variables of data.

Usage

1
2
3
4
diaMeasure(data, formula, value.var, measure = c("lv", "rdi", "ipd",
  "osa", "lv", "dl", "hamming", "lcs"), binary.index = c("jac", "dice",
  "cover"), weight = c(d = 1, i = 1, s = 1, t = 1), q = 1L, p = 0,
  bt = 0, useBytes = FALSE, variable.dist = FALSE)

Arguments

data

data frame, data.table or object that is coercible by data.table::as.data.table containing all the variables that are referenced by the formula. The data must be in long format meaning each response must be in a single column. data(dsample) is an example of the data format.

formula

formula indicating which variables represent identity variables and which measure variables. See Examples.

value.var

character vector of length one indicating which variable of data contains the linguistical values. See Examples.

measure

method: Dissimilarity index, match to ‘"osa"’, ‘"lv"’, ‘"dl"’, ‘"hamming"’, ‘"lcs"’, ‘"rdi"’, ‘"ipd"’.

binary.index

binary index to be used with multiple response. For measures ‘"rdi"’ and ‘"ipd"’ "jac" (jaccard) and "dice" (dice) are available. For the rest of measures "cover" (cover set distance) must be used.

weight

pondreation for the distance metrics. For "ipd" weights must be numeric vector of length one. The higher the weight for "ipd" the less impact very rare linguistical values have. See Goebsl

q

currently unused.

p

currently unused.

bt

currently unused.

useBytes

Perform byte-wise comparison, see stringdist-encoding (from the stringdist package).

variable.dist

logical of length one indicating if the dialectometrical distance needs to be computed between linguistical identities (FALSE) or linguistical variables (TRUE).

Value

a vector with dialectometric distances (can be coerced into matrix).

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
data(dsample)

## The linguistical identities are "gender" and "location" meaning that each line
## that belongs to the same location  and gender will be grouped. The distance between
## each gender and each location will be computed
measure <- diaMeasure(dsample, gender + location ~ question, 'answer', 'rdi')
print(measure)

## if the linguistical identity is only the gender then all the reponses that each gender has
## given belong to a single group. The distance between genders will be computed.
measure <- diaMeasure(dsample, gender ~ question, 'answer', 'rdi')
print(measure)

## locations defining the linguistical identities
measure <- diaMeasure(dsample, location ~ question, 'answer', 'rdi')
print(measure)

## measures between the questions instead of the locations
measure <- diaMeasure(dsample, location ~ question, 'answer', 'rdi', variable.dist = TRUE)
print(measure)

usobiaga/diaMeasures2 documentation built on July 8, 2019, 7:54 a.m.