LEDA | R Documentation |
Provides an interface to link ethnic groups from 12 different datasets to each other and calculate lingustic distances between them.
The LEDA package contains a full pipeline to link ethnic datasets from Africa. The main strength of LEDA consists in leveraging the structure of the language tree to provide a flexible link between any two ethnic group that are linked to the tree.
The package allows lists of ethnic groups to be linked to each other using three main linkage types: binary linking based on the relations of sets of language nodes associated with two groups; binary linking based on lingustic distances; and a full computation of dyadic linguistic distances.
Usage of a LEDA object is structured around lists of ethnic groups. These lists of groups stem from the original datasets that have been joined to the language tree. Lists are structured by data source, country, year, or, in the case of survey data, survey rounds. Via the language tree, any two lists of ethnic groups can be linked to each other.
For full information on the LEDA project and methodology, read the paper.
When using the LEDA package, please cite: Müller-Crepon, Carl, Yannick Pengl, and Nils-Christian Bormann (2020). Linking Ethnic Data from Africa. Unpublished working paper.
# Initialize leda.obj <- LEDA$new() # Apply any LEDA method # leda.obj$method() ## not run.
new()
Initialize a new LEDA object
LEDA$new()
library(LEDA) leda.obj <- LEDA$new()
show_list_parameters()
Returns a vector with the variables that define lists of ethnic groups in different datasets.
LEDA$show_list_parameters()
The variables coded are the following:
cowcode
: Correlates of War code of country
iso3c
: 3-letter isocode of country
type
: Type of ethnic group dataset.
One of: c("AMAR", "DHS", "SIDE", "EPR", "Fearon", "FRT",
"GREG", "Murdock_Map", "IPUMS", "Afrobarometer", "WLMS", "PREG")
marker
: Ethnic marker used in list.
"ethnic group"
: Ethnic group / ethnicity.
"language"
: Language.
"mtongue"
: Mother tongue.
groupvar
Variable name of ethnic group identifier
in original dataset.
round
: Round of survey (DHS; SIDE; Afrobarometer)
subround
: Subround of survey (DHS; SIDE)
year
: Year (EPR; IPUMS)
list.id
: ID of list of ethnic groups.
A vector.
# Initialize linkage object leda <- LEDA$new() # Get list parameters leda$show_list_parameters()
get_list_dict()
Returns the full dictionary of lists of ethnic groups that are included in the LEDA project. An example of a list is the IPUMS census data from Ghana in 2000.
LEDA$get_list_dict()
A DataFrame.
# Initialize linkage object leda <- LEDA$new() # Get list dictionaries list.dict <- leda$get_list_dict() head(list.dict)
get_list_dict_subset()
Returns a subset of the dictionary of lists of ethnic groups that are included in the LEDA project. An example of a list is the IPUMS census data from Ghana in 2000.
LEDA$get_list_dict_subset(param_list = list())
param_list
List of parameter values to subset list dictionary. The following fields are allowed:
cowcode
: Correlates of War code of country
iso3c
: 3-letter isocode of country
type
: Type of ethnic group dataset.
One of: c("AMAR", "DHS", "SIDE", "EPR", "Fearon", "FRT",
"GREG", "Murdock_Map", "IPUMS", "Afrobarometer", "WLMS", "PREG")
marker
: Ethnic marker used in list.
"ethnic group"
: Ethnic group / ethnicity.
"language"
: Language.
"mtongue"
: Mother tongue.
groupvar
Variable name of ethnic group identifier
in original dataset.
round
: Round of survey (DHS; SIDE; Afrobarometer)
subround
: Subround of survey (DHS; SIDE)
year
: Year (EPR; IPUMS)
list.id
: ID of list of ethnic groups.
These are parameters are also returned by method
LEDA$show_list_parameters()
.
A DataFrame.
# Initialize linkage object leda <- LEDA$new() # Get list data for Afrobarometers in Uganda leda$get_list_dict_subset(param_list = list(type = "Afrobarometer", iso3c = c("UGA","KEN")))
get_list_ids()
Returns the a subset of the IDs of lists of ethnic groups that are included in the LEDA project.
LEDA$get_list_ids(param_list = list())
param_list
List of parameter values to subset IDs from list dictionary. The following fields are allowed:
cowcode
: Correlates of War code of country
iso3c
: 3-letter isocode of country
type
: Type of ethnic group dataset.
One of: c("AMAR", "DHS", "SIDE", "EPR", "Fearon", "FRT",
"GREG", "Murdock_Map", "IPUMS", "Afrobarometer", "WLMS", "PREG")
marker
: Ethnic marker used in list.
"ethnic group"
: Ethnic group / ethnicity.
"language"
: Language.
"mtongue"
: Mother tongue.
groupvar
Variable name of ethnic group identifier
in original dataset.
round
: Round of survey (DHS; SIDE; Afrobarometer)
subround
: Subround of survey (DHS; SIDE)
year
: Year (EPR; IPUMS)
list.id
: ID of list of ethnic groups.
These are parameters are also returned by method
LEDA$show_list_parameters()
.
A vector of list.ids
# Initialize linkage object leda <- LEDA$new() # Get list IDs for Afrobarometers in Uganda leda$get_list_ids(param_list = list(type = "Afrobarometer", iso3c = c("UGA","KEN")))
link_set()
Returns a link table of ethnic groups contained in lists A and B,
based on their set relation. At the baseline, groups a to b are linked to each other
as soon as they share any language node at the level of the language tree specified by link_level
.
Links are provided between all lists in A with every list in B separately.
The returned DataFrame contains at least one row per group a that has been linked to the ethnologue language tree. If a is not linked to any group b, the columns that contains linked groups b are set to missing. The returned DataFrame contains multiple rows per group a if a is linked to multiple groups b.
LEDA$link_set( lists.a, lists.b, link.level, by.country = T, drop.a.threshold = 0, drop.b.threshold = 0, drop.ethno.id = T, add_listmetadata = T )
lists.a
Vector of lists A, identified via their
list.id returned by method LEDA$get_list_ids()
.
Or a list of parameters that specify lists A.
See LEDA$show_list_parameters()
..
lists.b
Vector of lists B, identified via their
list.id returned by method LEDA$get_list_ids()
.
Or a list of parameters that specify lists B.
See LEDA$show_list_parameters()
..
link.level
Level on the linguistic tree on
which ethnic groups are linked to each other.
Must be one of: 1:16
,
or 'language'
, or 'dialect'
. 'language'
corresponds to level 15,
or 'dialect'
corresponds to level 16.
by.country
Flag for linking lists only within
the same country (by.country = TRUE
),
or also across countries (by.country = FALSE
).
Defaults to TRUE
to avoid accidental
computation of a huge number of links.
drop.a.threshold
Maximum share of language nodes associated with a that have to be associated with group b for a link to be dropped.
drop.b.threshold
Maximum share of language nodes associated with b that have to be associated with group a for a link to be dropped.
drop.ethno.id
Drop all ethnologue language IDs
that are used to link group a with b. If
FALSE
, the returned DataFrame has as many rows
per link between a and b as there are
language nodes the two groups share.
add_listmetadata
Adds metadate of lists
A and B to the output. Defaults
to TRUE
.
A DataFrame. Columns include the names and identifiers of
groups a and b and the link.level
used for the
link. If expand = TRUE
, the column ethno.id
stores
the ID of each Ethnologue node used for a link. Columns
ei.frac.a
and ei.frac.b
store the fraction of language
nodes of a and b covered by a link.
ei.frac.alla
containes the fraction of nodes of a covered by all
groups b linked to a. ei.frac.allb
contains the
fraction of the nodes of all groups b covered the nodes of a.
This information can be used to further finetune a link.
# Initialize linkage object leda.obj <- LEDA$new() # link Afrobarometer to FRT setlink <- leda.obj$link_set(lists.a = list(type = "Afrobarometer", iso3c = c("UGA","KEN")), lists.b = list(type = "FRT", iso3c = c("UGA","KEN")), link.level = "dialect", by.country = TRUE, drop.a.threshold = 0, drop.b.threshold = 0, drop.ethno.id = TRUE, add_listmetadata = TRUE) head(setlink)
ling_distance()
Returns a table of linguistic distances between all ethnic groups
contained in lists A and B. The
linguistic distance between two languages L_1
and L_2
is computed as
1 - ((d(L_1,R) + d(L_2,R) - d(L_1,L_2)) / (d(L_1,R) + d(L_2,R)))^{\delta}
where d(L_i,R)
is the length of path from a
language to the tree's origin and
d(L_1,L_2)
is the length of
the shortest path from the first to the second
language. \delta
is an exponent to discount
short distances on the tree.
Because we oftentimes link ethnic groups
a and b to several languages, we have to aggregate
the resulting distance matrix, for example by taking the minimum
distance between all languages L_a
in group a
to
all languages L_b
associated with group b
.
LEDA$ling_distance( lists.a, lists.b, by.country = T, delta = 0.5, expand = TRUE, level = c("dialect", "language"), agg_fun.a = min, agg_fun.b = min, add_listmetadata = T )
lists.a
Vector of lists A, identified via their
list.id returned by method LEDA$get_list_ids()
.
Or a list of parameters that specify lists A.
See LEDA$show_list_parameters()
..
lists.b
Vector of lists B, identified via
their list.id returned by method LEDA$get_list_ids()
.
Or a list of parameters that specify lists B.
See LEDA$show_list_parameters()
..
by.country
Flag for computing distances
only between groups in the
same country (by.country = TRUE
),
or also across countries (by.country = FALSE
).
Defaults to TRUE
to avoid accidental
computation of a huge number of distances.
delta
Delta parameter used to discount
short distances on the language tree.
See LEDA$ling_distance()
. for details.
expand
Expand the language tree so that all
languages are located on level 15 or not.
If FALSE
languages are located on their
original position in the linguistic tree,
which can be considerably closer to the root of
the tree. Defaults to FALSE
for
reasons explained in the paper.
level
Level on the linguistic tree from
which distances are computed.
Must be 'language'
or 'dialect'
.
'language'
corresponds to level 15,
or 'dialect'
corresponds to level 16.
agg_fun.a
Function used aggregate linguistic distances
across the nodes associated with group a to group b
in the (common) cases where a is associated with multiple
language nodes. Defaults to min
.
agg_fun.b
Function used aggregate linguistic distances
across the nodes associated with group b to each node
associated with group a.
in the (common) cases where b is associated with multiple
language nodes. Defaults to min
.
add_listmetadata
Adds metadate of lists
A and B to the output. Defaults
to TRUE
.
A DataFrame. Columns include the names and identifiers of
groups a and b. Column distance
stores
the linguistic distance between groups a and b.
# Initialize linkage object leda.obj <- LEDA$new() # link Afrobarometer to FRT ling.distance <- leda.obj$ling_distance( lists.a = list(type = c("Afrobarometer"), iso3c = "UGA"), lists.b = list(type = c("FRT"), iso3c = "UGA"), level = "dialect", by.country = TRUE, delta = .5, expand = FALSE, agg_fun.a = mean, agg_fun.b = min) head(ling.distance)
link_withinlingdist()
Returns a link table of ethnic groups contained in lists A and B.
Each for each list of ethnic groups in A and B, each group a is linked to
all groups b within a linguistic distance specified by max.distance
.
Note that group a can be therefore linked to several groups b.
Links are provided between all lists in A with every list in B separately.
The returned DataFrame contains at least one row per group a that has been linked to the ethnologue language tree. If a is not linked to any group b, the columns that contains linked groups b are set to missing. The returned DataFrame contains multiple rows per group a if a is linked to multiple groups b.
LEDA$link_withinlingdist( lists.a, lists.b, max.distance, by.country = T, level = c("dialect", "language"), delta = 0.5, expand = FALSE, agg_fun.a = min, agg_fun.b = min, add_listmetadata = T )
lists.a
Vector of lists A, identified via their
list.id returned by method LEDA$get_list_ids()
..
Or a list of parameters that specify lists A.
See LEDA$show_list_parameters()
..
lists.b
Vector of lists B, identified via
their list.id returned by method LEDA$get_list_ids()
..
Or a list of parameters that specify lists B.
See LEDA$show_list_parameters()
..
max.distance
Maximum linguistic distance.
All pairs of groups a and b with a
distance smaller or equal max.distance
are linked to each other.
by.country
Flag for linking lists only within
the same country (by.country = TRUE
),
or also across countries (by.country = FALSE
).
Defaults to TRUE
to avoid accidental
computation of a huge number of links.
level
Level on the linguistic tree from
which distances are computed.
Must be 'language'
or 'dialect'
.
'language'
corresponds to level 15,
or 'dialect'
corresponds to level 16.
delta
Delta parameter used to discount
short distances on the language tree.
Affect the links returned.
See LEDA$ling_distance()
. for details.
expand
Expand the language tree so that all
languages are located on level 15 or not.
If FALSE
languages are located on their
original position in the linguistic tree,
which can be considerably closer to the root of
the tree. Defaults to FALSE
for
reasons explained in the paper.
agg_fun.a
Function used aggregate linguistic distances
across the nodes associated with group a to group b
in the (common) cases where a is associated with multiple
language nodes. Defaults to min
.
agg_fun.b
Function used aggregate linguistic distances
across the nodes associated with group b to each node
associated with group a.
in the (common) cases where b is associated with multiple
language nodes. Defaults to min
.
add_listmetadata
Adds metadate of lists
A and B to the output. Defaults
to TRUE
.
A DataFrame. Columns include the names and identifiers of
groups a and b. Column distance
stores
the linguistic distance between groups a and b.
# Initialize linkage object leda.obj <- LEDA$new() # link Afrobarometer to FRT link.withindist <- leda.obj$link_withinlingdist( lists.a = list(type = c("Afrobarometer"), iso3c = "UGA"), lists.b = list(type = c("FRT"), iso3c = "UGA"), level = "dialect", max.distance = .1, by.country = TRUE, delta = .5, expand = FALSE, agg_fun.a = mean, agg_fun.b = min) head(link.withindist)
link_minlingdist()
Returns a link table of ethnic groups contained in lists A and B. Each for each list of ethnic groups in A and B, each group a is linked to its closest linguistic neighbour b. Note that group a can be linked to several groups b if they are equidistant to a. Links are provided between all lists in A with every list in B separately.
The returned DataFrame contains at least one row per group a that has been linked to the ethnologue language tree. If a is not linked to any group b, the columns that contains linked groups b are set to missing. The returned DataFrame contains multiple rows per group a if a is linked to multiple groups b.
LEDA$link_minlingdist( lists.a, lists.b, level = c("dialect", "language"), by.country = T, expand = FALSE, delta = 0.5, agg_fun.a = min, agg_fun.b = min, add_listmetadata = T )
lists.a
Vector of lists A, identified via their
list.id returned by method LEDA$get_list_ids()
..
Or a list of parameters that specify lists A.
See LEDA$show_list_parameters()
.
lists.b
Vector of lists B, identified via
their list.id returned by method LEDA$get_list_ids()
..
Or a list of parameters that specify lists B.
See LEDA$show_list_parameters()
.
level
Level on the linguistic tree from
which distances are computed.
Must be 'language'
or 'dialect'
.
'language'
corresponds to level 15,
or 'dialect'
corresponds to level 16.
by.country
Flag for linking lists only within
the same country (by.country = TRUE
),
or also across countries (by.country = FALSE
).
Defaults to TRUE
to avoid accidental
computation of a huge number of links.
expand
Expand the language tree so that all
languages are located on level 15 or not.
If FALSE
languages are located on their
original position in the linguistic tree,
which can be considerably closer to the root of
the tree. Defaults to FALSE
for
reasons explained in the paper.
delta
Delta parameter used to discount
short distances on the language tree.
Does not affect the links, only the absolute
linguistic distance associated with them
(but not their rank).
See LEDA$ling_distance()
. for details.
agg_fun.a
Function used aggregate linguistic distances
across the nodes associated with group a to group b
in the (common) cases where a is associated with multiple
language nodes. Defaults to min
.
agg_fun.b
Function used aggregate linguistic distances
across the nodes associated with group b to each node
associated with group a.
in the (common) cases where b is associated with multiple
language nodes. Defaults to min
.
add_listmetadata
Adds metadate of lists
A and B to the output. Defaults
to TRUE
.
A DataFrame. Columns include the names and identifiers of
groups a and b. Column distance
stores
the linguistic distance between groups a and b.
# Initialize linkage object leda.obj <- LEDA$new() # link Afrobarometer to FRT link.mindist <- leda.obj$link_minlingdist( lists.a = list(type = c("Afrobarometer"), iso3c = "UGA"), lists.b = list(type = c("FRT"), iso3c = "UGA"), level = "dialect", by.country = TRUE, expand = FALSE, delta = .5, agg_fun.a = mean, agg_fun.b = min) head(link.mindist)
get_raw_ethnolinks()
Retrieve the raw coding of links
between ethnic groups
contained in group lists specified by param_list
to the language tree. The function returns the raw data
the LEDA object is based upon.
LEDA$get_raw_ethnolinks(param_list)
param_list
List of parameter values to subset lists of ethnic groups. The following fields are allowed:
cowcode
: Correlates of War code of country
iso3c
: 3-letter isocode of country
type
: Type of ethnic group dataset.
One of: c("AMAR", "DHS", "SIDE", "EPR", "Fearon", "FRT",
"GREG", "Murdock_Map", "IPUMS", "Afrobarometer", "WLMS", "PREG")
marker
: Ethnic marker used in list.
"ethnic group"
: Ethnic group / ethnicity.
"language"
: Language.
"mtongue"
: Mother tongue.
groupvar
Variable name of ethnic group identifier
in original dataset.
round
: Round of survey (DHS; SIDE; Afrobarometer)
subround
: Subround of survey (DHS; SIDE)
year
: Year (EPR; IPUMS)
list.id
: ID of list of ethnic groups.
These are parameters are also returned by method
LEDA$show_list_parameters()
.
# Initialize linkage object leda <- LEDA$new() # Get list parameters leda$get_raw_ethnolinks(param_list = list(type = "Afrobarometer", iso3c = "UGA"))
get_types()
Get types of group lists in LEDA object
LEDA$get_types()
# Initialize linkage object leda <- LEDA$new() # Get list parameters leda$get_types()
prepare_newlink_table()
Based on an input of ethnic group names, the function returns a link table between the groups and automatically found likely matches on the language tree. These non-authorative 'suggestions' are identified via a fuzzy string match of input group names with (1) the names of nodes on the langauge tree, including dialects and alternative names, as well as (2) the names of groups that have been previously linked to the tree as contained in the respective LEDA object.
LEDA$prepare_newlink_table( group.df, groupvar, by.country = FALSE, return = TRUE, save.path = NULL, overwrite = FALSE, prev_link_param_list = NULL, levenshtein.threshold = 0.2, levenshtein.costs = c(insertions = 1, deletions = 1, substitutions = 1) )
group.df
A DataFrame
that contains the
names of ethnic groups to be linked to the language
tree, as well as any other (meta)data whished to retain.
groupvar
String containing the name of the
column in group.df
that contains the
names of ethnic groups.
by.country
Logical determining whether ethnic groups
names should be string matched separately within and outside the
country they belong. Setting the parameter TRUE
leads to more plausible matches, but requires a
column iso3c
in group.df
. Column iso3c
has to contain valid 3-letter iso codes of African
countries. See e.g. the R-package countrycode
.
return
Logical determining whether the reulting link table shall be returned.
save.path
String of the path to which the reulting
link table is stored, as a .csv file. If NULL
(the default),
nothing is stored.
overwrite
Logical determining whether a previously
existing file located by save.path
is overwritten.
prev_link_param_list
Parameters that determine the
subset of previous links between ethnic group lists and
the language tree to automatically retrieve link suggestions from.
If NULL
(the default), all available lists are used.
See LEDA$show_list_parameters()
. for details.
levenshtein.threshold
Threshold Levenshtein string distance below which a fuzzy string match is returned.
levenshtein.costs
Vector of costs used to compute
Levenshtein string distance. See utils::adist
for details.
The automatic links should not be regarded as authorative,
but merely as a help to facilitate the coding,
which proceeds outside the LEDA environment from
this point onwards. The final table returned by the
function has a column called link
, which is
to be filled by the user, using the automatically
generated suggestions
and secondary data sources. Once this coding is completed,
the table can be added to the LEDA object with the method
LEDA$add_tree_links()
..
A DataFrame
of the same height as group.df
.
In addition to the columns of group.df
,
it contains the following columns:
auto_link_org
: Language tree nodes
(levels 1:14; languages, i.e. level 15)
matched via their original (org) name.
auto_link_alt
: Languages (level 15) matched
via their alternative name.
auto_link_dial
: Dialects matched via their
name or alternative name.
auto_link_prev
: Language tree nodes
found via a fuzzy string match of input groups
to the groups previously linked to the language
tree (potentially subsetted by prev_link_param_list
).
auto_link_foreign
: If by.country
,
same four fields as above for langauges from
'foreign' countries, but pasted into one
single string.
link
Empty column for the final link,
to be filled by the user.
comment
Empty column for comments on the final link,
to be filled by the user.
source
Empty column for the source of the final link,
to be filled by the user.
Multiple matches are combined by pasting the languague names
separated by a '|'. The matched language nodes' tree level
is indicated behind its name in '[]', with L1:L14 denoting
super-language levels, 'lang' denoting languages, and 'dial'
denoting dialects. This coding format should be maintained
when filling the column link
with the final link of groups
to the language tree.
# Initialize linkage object leda <- LEDA$new() # Make or load some dataset of ethnic groups new.groups.df <- data.frame(group_name = c("Asante", "Grusi", "Akan"),iso3c = c("GHA"), type = "My Survey in Ghana", marker = "ethnic group", stringsAsFactors = FALSE) # Prepare a new link table newlink.df <- leda$prepare_newlink_table( group.df = new.groups.df, groupvar = "group_name", by.country = TRUE, return = TRUE, save.path = NULL, overwrite = TRUE, prev_link_param_list = NULL, levenshtein.threshold = .2, levenshtein.costs = c(insertions = 1,deletions = 1, substitutions = 1))
add_tree_links()
Function to add a table that links a list of ethnic groups with nodes on the language tree to the LEDA object.
LEDA$add_tree_links(tree.link.df, idvars, type)
tree.link.df
Table that contains te links between ethnic groups and the langauge tree. It must contain the following variables:
group
Names of ethnic groups, of type "character".
link
Names of language nodes linked to ethnic
group, of type "character". Multiple nodes linked to the
same ethnic group must be separated by a '|'. Names
of language nodes must be contained in the Ethnologue
database 13. A node's level is specified in [], the node must
exist on that level. Level specifiers follow this form: L1:L14 denoting
super-language levels, 'lang' denoting languages, 'dial'
denoting dialects, and iso denoting language ISO-codes.
E.g. "Akan [L9]", "Asante [dial]", or "aka [iso]". If no level
is specified and multiple langauge tree nodes share the same name,
the one closest to the node is chosen.
Note that the safest way to avoid confusion is to provide iso-codes.
iso3c
(not required) Country identifier as 3-letter ISO code.
If provided, the algorithm gives preference to nodes of the
language tree in the same country in cases where multiple nodes
share the same name given by link
.
idvars
Additional variables that identify lists of ethnic groups in your
data. See below.
idvars
Variables that identify lists of ethnic groups in your
data. These should contain a list type
(e.g. "My survey"),
and be typically nested within countries and years.
See LEDA$show_list_parameters()
. for the variables that
identify ethnic group lists in the LEDA dataset.
The unique combinations of idvars
values in group.df
are used to create new entries in the LEDA object's dictionary of
group lists leda.obj$list.dict
.
type
String that contains the 'type' of the group list to add, for example 'My survey'. Must not be one of the types already in the LEDA project.
The function links the input ethnic groups to the linguistic tree contained in the package and thereby updates the LEDA object. Once this has been done, the added group can be linked to all ethnic group lists contained in the LEDA object. See examples below.
# Initialize linkage object leda <- LEDA$new() # Make toy link dataset new.groups.df <- data.frame( group = c("Asante", "Mossi"), ## Ethnic group names link = c("Asante [dial]","Moore [org]"), ## Language nodes marker = "Ethnic self identification", iso3c = c("GHA", "BFA"), ## Countries stringsAsFactors = FALSE ## Everything as character ) # Add to LEDA leda$add_tree_links(tree.link.df = new.groups.df, idvars = c("iso3c", "type", "marker"), type = "My data") # Use the new link setlink <- leda$link_set(lists.a = list(type = c("My data")), lists.b = list(type = c("Afrobarometer"), round = 4, marker = "language", iso3c = c("GHA","BFA")), link.level = 15, by.country = FALSE, drop.b.threshold = 0, drop.ethno.id = TRUE) head(setlink[, c("a.group", "b.group", "a.type", "b.type", "a.list.id", "b.list.id")])
clone()
The objects of this class are cloneable with this method.
LEDA$clone(deep = FALSE)
deep
Whether to make a deep clone.
# Initialize linkage object
leda.obj <- LEDA$new()
# link Afrobarometer to FRT
## Based on set relation
setlink <- leda.obj$link_set(
lists.a = list(type = c("Afrobarometer"),
iso3c = "UGA"),
lists.b = list(type = c("FRT"),iso3c = "UGA"),
link.level = "dialect", by.country = TRUE,
drop.b.threshold = 0, drop.ethno.id = TRUE)
head(setlink[,c("a.group","b.group")])
## Nearest linguistic neighbor
mindistlink <- leda.obj$link_minlingdist(
lists.a = list(type = c("Afrobarometer"), iso3c = "UGA"),
lists.b = list(type = c("FRT"), iso3c = "UGA"),
level = "dialect",
by.country = TRUE, expand = FALSE, delta = .5,
agg_fun.a = mean, agg_fun.b = min)
head(mindistlink[,c("a.group","b.group")])
## Within maximum linguistic distance
withindistlink <- leda.obj$link_withinlingdist(
lists.a = list(type = c("Afrobarometer"), iso3c = "UGA"),
lists.b = list(type = c("FRT"), iso3c = "UGA"),
level = "dialect",
max.distance = .1, by.country = TRUE,
delta = .5, expand = FALSE,
agg_fun.a = mean, agg_fun.b = min)
head(withindistlink[,c("a.group","b.group")])
## Compute pairwise linguistic distance
distance.df <- leda.obj$ling_distance(
lists.a = list(type = c("Afrobarometer"), iso3c = "UGA"),
lists.b = list(type = c("FRT"), iso3c = "UGA"),
level = "dialect",
by.country = TRUE, delta = .5,
expand = FALSE,
agg_fun.a = mean, agg_fun.b = min)
head(distance.df[,c("a.group","b.group")])
## ------------------------------------------------
## Method `LEDA$new`
## ------------------------------------------------
library(LEDA)
leda.obj <- LEDA$new()
## ------------------------------------------------
## Method `LEDA$show_list_parameters`
## ------------------------------------------------
# Initialize linkage object
leda <- LEDA$new()
# Get list parameters
leda$show_list_parameters()
## ------------------------------------------------
## Method `LEDA$get_list_dict`
## ------------------------------------------------
# Initialize linkage object
leda <- LEDA$new()
# Get list dictionaries
list.dict <- leda$get_list_dict()
head(list.dict)
## ------------------------------------------------
## Method `LEDA$get_list_dict_subset`
## ------------------------------------------------
# Initialize linkage object
leda <- LEDA$new()
# Get list data for Afrobarometers in Uganda
leda$get_list_dict_subset(param_list =
list(type = "Afrobarometer", iso3c = c("UGA","KEN")))
## ------------------------------------------------
## Method `LEDA$get_list_ids`
## ------------------------------------------------
# Initialize linkage object
leda <- LEDA$new()
# Get list IDs for Afrobarometers in Uganda
leda$get_list_ids(param_list =
list(type = "Afrobarometer", iso3c = c("UGA","KEN")))
## ------------------------------------------------
## Method `LEDA$link_set`
## ------------------------------------------------
# Initialize linkage object
leda.obj <- LEDA$new()
# link Afrobarometer to FRT
setlink <- leda.obj$link_set(lists.a = list(type = "Afrobarometer",
iso3c = c("UGA","KEN")),
lists.b = list(type = "FRT",
iso3c = c("UGA","KEN")),
link.level = "dialect",
by.country = TRUE,
drop.a.threshold = 0,
drop.b.threshold = 0,
drop.ethno.id = TRUE,
add_listmetadata = TRUE)
head(setlink)
## ------------------------------------------------
## Method `LEDA$ling_distance`
## ------------------------------------------------
# Initialize linkage object
leda.obj <- LEDA$new()
# link Afrobarometer to FRT
ling.distance <- leda.obj$ling_distance(
lists.a = list(type = c("Afrobarometer"), iso3c = "UGA"),
lists.b = list(type = c("FRT"), iso3c = "UGA"),
level = "dialect",
by.country = TRUE, delta = .5,
expand = FALSE,
agg_fun.a = mean, agg_fun.b = min)
head(ling.distance)
## ------------------------------------------------
## Method `LEDA$link_withinlingdist`
## ------------------------------------------------
# Initialize linkage object
leda.obj <- LEDA$new()
# link Afrobarometer to FRT
link.withindist <- leda.obj$link_withinlingdist(
lists.a = list(type = c("Afrobarometer"), iso3c = "UGA"),
lists.b = list(type = c("FRT"), iso3c = "UGA"),
level = "dialect",
max.distance = .1, by.country = TRUE,
delta = .5, expand = FALSE,
agg_fun.a = mean, agg_fun.b = min)
head(link.withindist)
## ------------------------------------------------
## Method `LEDA$link_minlingdist`
## ------------------------------------------------
# Initialize linkage object
leda.obj <- LEDA$new()
# link Afrobarometer to FRT
link.mindist <- leda.obj$link_minlingdist(
lists.a = list(type = c("Afrobarometer"), iso3c = "UGA"),
lists.b = list(type = c("FRT"), iso3c = "UGA"),
level = "dialect",
by.country = TRUE, expand = FALSE, delta = .5,
agg_fun.a = mean, agg_fun.b = min)
head(link.mindist)
## ------------------------------------------------
## Method `LEDA$get_raw_ethnolinks`
## ------------------------------------------------
# Initialize linkage object
leda <- LEDA$new()
# Get list parameters
leda$get_raw_ethnolinks(param_list =
list(type = "Afrobarometer", iso3c = "UGA"))
## ------------------------------------------------
## Method `LEDA$get_types`
## ------------------------------------------------
# Initialize linkage object
leda <- LEDA$new()
# Get list parameters
leda$get_types()
## ------------------------------------------------
## Method `LEDA$prepare_newlink_table`
## ------------------------------------------------
# Initialize linkage object
leda <- LEDA$new()
# Make or load some dataset of ethnic groups
new.groups.df <- data.frame(group_name =
c("Asante", "Grusi", "Akan"),iso3c = c("GHA"),
type = "My Survey in Ghana",
marker = "ethnic group",
stringsAsFactors = FALSE)
# Prepare a new link table
newlink.df <- leda$prepare_newlink_table(
group.df = new.groups.df,
groupvar = "group_name",
by.country = TRUE,
return = TRUE, save.path = NULL,
overwrite = TRUE, prev_link_param_list = NULL,
levenshtein.threshold = .2,
levenshtein.costs =
c(insertions = 1,deletions = 1, substitutions = 1))
## ------------------------------------------------
## Method `LEDA$add_tree_links`
## ------------------------------------------------
# Initialize linkage object
leda <- LEDA$new()
# Make toy link dataset
new.groups.df <- data.frame(
group = c("Asante", "Mossi"), ## Ethnic group names
link = c("Asante [dial]","Moore [org]"), ## Language nodes
marker = "Ethnic self identification",
iso3c = c("GHA", "BFA"), ## Countries
stringsAsFactors = FALSE ## Everything as character
)
# Add to LEDA
leda$add_tree_links(tree.link.df = new.groups.df,
idvars = c("iso3c", "type", "marker"),
type = "My data")
# Use the new link
setlink <- leda$link_set(lists.a = list(type = c("My data")),
lists.b = list(type = c("Afrobarometer"),
round = 4, marker = "language",
iso3c = c("GHA","BFA")),
link.level = 15, by.country = FALSE,
drop.b.threshold = 0, drop.ethno.id = TRUE)
head(setlink[, c("a.group", "b.group", "a.type", "b.type", "a.list.id", "b.list.id")])
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.