LEDA: Linking Ethnic Data from Africa

LEDAR Documentation

Linking Ethnic Data from Africa

Description

Provides an interface to link ethnic groups from 12 different datasets to each other and calculate lingustic distances between them.

Details

The LEDA package contains a full pipeline to link ethnic datasets from Africa. The main strength of LEDA consists in leveraging the structure of the language tree to provide a flexible link between any two ethnic group that are linked to the tree.

The package allows lists of ethnic groups to be linked to each other using three main linkage types: binary linking based on the relations of sets of language nodes associated with two groups; binary linking based on lingustic distances; and a full computation of dyadic linguistic distances.

Usage of a LEDA object is structured around lists of ethnic groups. These lists of groups stem from the original datasets that have been joined to the language tree. Lists are structured by data source, country, year, or, in the case of survey data, survey rounds. Via the language tree, any two lists of ethnic groups can be linked to each other.

For full information on the LEDA project and methodology, read the paper.

When using the LEDA package, please cite: Müller-Crepon, Carl, Yannick Pengl, and Nils-Christian Bormann (2020). Linking Ethnic Data from Africa. Unpublished working paper.

Usage

# Initialize
leda.obj <- LEDA$new()

# Apply any LEDA method
# leda.obj$method() ## not run.

Methods

Public methods


Method new()

Initialize a new LEDA object

Usage
LEDA$new()
Examples
library(LEDA)
leda.obj <- LEDA$new()


Method show_list_parameters()

Returns a vector with the variables that define lists of ethnic groups in different datasets.

Usage
LEDA$show_list_parameters()
Details

The variables coded are the following:

cowcode: Correlates of War code of country

iso3c: 3-letter isocode of country

type: Type of ethnic group dataset. One of: c("AMAR", "DHS", "SIDE", "EPR", "Fearon", "FRT", "GREG", "Murdock_Map", "IPUMS", "Afrobarometer", "WLMS", "PREG")

marker: Ethnic marker used in list. "ethnic group": Ethnic group / ethnicity. "language": Language. "mtongue": Mother tongue.

groupvar Variable name of ethnic group identifier in original dataset.

round: Round of survey (DHS; SIDE; Afrobarometer)

subround: Subround of survey (DHS; SIDE)

year: Year (EPR; IPUMS)

list.id: ID of list of ethnic groups.

Returns

A vector.

Examples
# Initialize linkage object
leda <- LEDA$new()

# Get list parameters
leda$show_list_parameters()

Method get_list_dict()

Returns the full dictionary of lists of ethnic groups that are included in the LEDA project. An example of a list is the IPUMS census data from Ghana in 2000.

Usage
LEDA$get_list_dict()
Returns

A DataFrame.

Examples
# Initialize linkage object
leda <- LEDA$new()

# Get list dictionaries
list.dict <- leda$get_list_dict()
head(list.dict)

Method get_list_dict_subset()

Returns a subset of the dictionary of lists of ethnic groups that are included in the LEDA project. An example of a list is the IPUMS census data from Ghana in 2000.

Usage
LEDA$get_list_dict_subset(param_list = list())
Arguments
param_list

List of parameter values to subset list dictionary. The following fields are allowed:

cowcode: Correlates of War code of country

iso3c: 3-letter isocode of country

type: Type of ethnic group dataset. One of: c("AMAR", "DHS", "SIDE", "EPR", "Fearon", "FRT", "GREG", "Murdock_Map", "IPUMS", "Afrobarometer", "WLMS", "PREG")

marker: Ethnic marker used in list. "ethnic group": Ethnic group / ethnicity. "language": Language. "mtongue": Mother tongue.

groupvar Variable name of ethnic group identifier in original dataset.

round: Round of survey (DHS; SIDE; Afrobarometer)

subround: Subround of survey (DHS; SIDE)

year: Year (EPR; IPUMS)

list.id: ID of list of ethnic groups.

These are parameters are also returned by method LEDA$show_list_parameters().

Returns

A DataFrame.

Examples
# Initialize linkage object
leda <- LEDA$new()

# Get list data for Afrobarometers in Uganda
leda$get_list_dict_subset(param_list = 
     list(type = "Afrobarometer", iso3c = c("UGA","KEN")))

Method get_list_ids()

Returns the a subset of the IDs of lists of ethnic groups that are included in the LEDA project.

Usage
LEDA$get_list_ids(param_list = list())
Arguments
param_list

List of parameter values to subset IDs from list dictionary. The following fields are allowed:

cowcode: Correlates of War code of country

iso3c: 3-letter isocode of country

type: Type of ethnic group dataset. One of: c("AMAR", "DHS", "SIDE", "EPR", "Fearon", "FRT", "GREG", "Murdock_Map", "IPUMS", "Afrobarometer", "WLMS", "PREG")

marker: Ethnic marker used in list. "ethnic group": Ethnic group / ethnicity. "language": Language. "mtongue": Mother tongue.

groupvar Variable name of ethnic group identifier in original dataset.

round: Round of survey (DHS; SIDE; Afrobarometer)

subround: Subround of survey (DHS; SIDE)

year: Year (EPR; IPUMS)

list.id: ID of list of ethnic groups.

These are parameters are also returned by method LEDA$show_list_parameters().

Returns

A vector of list.ids

Examples
# Initialize linkage object
leda <- LEDA$new()

# Get list IDs for Afrobarometers in Uganda
leda$get_list_ids(param_list = 
   list(type = "Afrobarometer", iso3c = c("UGA","KEN")))

Method link_set()

Returns a link table of ethnic groups contained in lists A and B, based on their set relation. At the baseline, groups a to b are linked to each other as soon as they share any language node at the level of the language tree specified by link_level. Links are provided between all lists in A with every list in B separately.

The returned DataFrame contains at least one row per group a that has been linked to the ethnologue language tree. If a is not linked to any group b, the columns that contains linked groups b are set to missing. The returned DataFrame contains multiple rows per group a if a is linked to multiple groups b.

Usage
LEDA$link_set(
  lists.a,
  lists.b,
  link.level,
  by.country = T,
  drop.a.threshold = 0,
  drop.b.threshold = 0,
  drop.ethno.id = T,
  add_listmetadata = T
)
Arguments
lists.a

Vector of lists A, identified via their list.id returned by method LEDA$get_list_ids(). Or a list of parameters that specify lists A. See LEDA$show_list_parameters()..

lists.b

Vector of lists B, identified via their list.id returned by method LEDA$get_list_ids(). Or a list of parameters that specify lists B. See LEDA$show_list_parameters()..

link.level

Level on the linguistic tree on which ethnic groups are linked to each other. Must be one of: 1:16, or 'language', or 'dialect'. 'language' corresponds to level 15, or 'dialect' corresponds to level 16.

by.country

Flag for linking lists only within the same country (by.country = TRUE), or also across countries (by.country = FALSE). Defaults to TRUE to avoid accidental computation of a huge number of links.

drop.a.threshold

Maximum share of language nodes associated with a that have to be associated with group b for a link to be dropped.

drop.b.threshold

Maximum share of language nodes associated with b that have to be associated with group a for a link to be dropped.

drop.ethno.id

Drop all ethnologue language IDs that are used to link group a with b. If FALSE, the returned DataFrame has as many rows per link between a and b as there are language nodes the two groups share.

add_listmetadata

Adds metadate of lists A and B to the output. Defaults to TRUE.

Returns

A DataFrame. Columns include the names and identifiers of groups a and b and the link.level used for the link. If expand = TRUE, the column ethno.id stores the ID of each Ethnologue node used for a link. Columns ei.frac.a and ei.frac.b store the fraction of language nodes of a and b covered by a link. ei.frac.alla containes the fraction of nodes of a covered by all groups b linked to a. ei.frac.allb contains the fraction of the nodes of all groups b covered the nodes of a. This information can be used to further finetune a link.

Examples
# Initialize linkage object
leda.obj <- LEDA$new()

# link Afrobarometer to FRT
setlink <- leda.obj$link_set(lists.a = list(type = "Afrobarometer", 
                                          iso3c = c("UGA","KEN")),
                           lists.b = list(type = "FRT", 
                                          iso3c = c("UGA","KEN")), 
                           link.level = "dialect",  
                           by.country = TRUE, 
                           drop.a.threshold = 0, 
                           drop.b.threshold = 0, 
                           drop.ethno.id = TRUE, 
                           add_listmetadata = TRUE)
head(setlink)

Method ling_distance()

Returns a table of linguistic distances between all ethnic groups contained in lists A and B. The linguistic distance between two languages L_1 and L_2 is computed as

1 - ((d(L_1,R) + d(L_2,R) - d(L_1,L_2)) / (d(L_1,R) + d(L_2,R)))^{\delta}

where d(L_i,R) is the length of path from a language to the tree's origin and d(L_1,L_2) is the length of the shortest path from the first to the second language. \delta is an exponent to discount short distances on the tree.

Because we oftentimes link ethnic groups a and b to several languages, we have to aggregate the resulting distance matrix, for example by taking the minimum distance between all languages L_a in group a to all languages L_b associated with group b.

Usage
LEDA$ling_distance(
  lists.a,
  lists.b,
  by.country = T,
  delta = 0.5,
  expand = TRUE,
  level = c("dialect", "language"),
  agg_fun.a = min,
  agg_fun.b = min,
  add_listmetadata = T
)
Arguments
lists.a

Vector of lists A, identified via their list.id returned by method LEDA$get_list_ids(). Or a list of parameters that specify lists A. See LEDA$show_list_parameters()..

lists.b

Vector of lists B, identified via their list.id returned by method LEDA$get_list_ids(). Or a list of parameters that specify lists B. See LEDA$show_list_parameters()..

by.country

Flag for computing distances only between groups in the same country (by.country = TRUE), or also across countries (by.country = FALSE). Defaults to TRUE to avoid accidental computation of a huge number of distances.

delta

Delta parameter used to discount short distances on the language tree. See LEDA$ling_distance(). for details.

expand

Expand the language tree so that all languages are located on level 15 or not. If FALSE languages are located on their original position in the linguistic tree, which can be considerably closer to the root of the tree. Defaults to FALSE for reasons explained in the paper.

level

Level on the linguistic tree from which distances are computed. Must be 'language' or 'dialect'. 'language' corresponds to level 15, or 'dialect' corresponds to level 16.

agg_fun.a

Function used aggregate linguistic distances across the nodes associated with group a to group b in the (common) cases where a is associated with multiple language nodes. Defaults to min.

agg_fun.b

Function used aggregate linguistic distances across the nodes associated with group b to each node associated with group a. in the (common) cases where b is associated with multiple language nodes. Defaults to min.

add_listmetadata

Adds metadate of lists A and B to the output. Defaults to TRUE.

Returns

A DataFrame. Columns include the names and identifiers of groups a and b. Column distance stores the linguistic distance between groups a and b.

Examples
# Initialize linkage object
leda.obj <- LEDA$new()

# link Afrobarometer to FRT
ling.distance <- leda.obj$ling_distance(
   lists.a = list(type = c("Afrobarometer"),  iso3c = "UGA"),
   lists.b = list(type = c("FRT"), iso3c = "UGA"),
   level = "dialect",
   by.country = TRUE, delta = .5, 
   expand = FALSE,  
   agg_fun.a = mean, agg_fun.b = min)
head(ling.distance)

Method link_withinlingdist()

Returns a link table of ethnic groups contained in lists A and B. Each for each list of ethnic groups in A and B, each group a is linked to all groups b within a linguistic distance specified by max.distance. Note that group a can be therefore linked to several groups b. Links are provided between all lists in A with every list in B separately.

The returned DataFrame contains at least one row per group a that has been linked to the ethnologue language tree. If a is not linked to any group b, the columns that contains linked groups b are set to missing. The returned DataFrame contains multiple rows per group a if a is linked to multiple groups b.

Usage
LEDA$link_withinlingdist(
  lists.a,
  lists.b,
  max.distance,
  by.country = T,
  level = c("dialect", "language"),
  delta = 0.5,
  expand = FALSE,
  agg_fun.a = min,
  agg_fun.b = min,
  add_listmetadata = T
)
Arguments
lists.a

Vector of lists A, identified via their list.id returned by method LEDA$get_list_ids().. Or a list of parameters that specify lists A. See LEDA$show_list_parameters()..

lists.b

Vector of lists B, identified via their list.id returned by method LEDA$get_list_ids().. Or a list of parameters that specify lists B. See LEDA$show_list_parameters()..

max.distance

Maximum linguistic distance. All pairs of groups a and b with a distance smaller or equal max.distance are linked to each other.

by.country

Flag for linking lists only within the same country (by.country = TRUE), or also across countries (by.country = FALSE). Defaults to TRUE to avoid accidental computation of a huge number of links.

level

Level on the linguistic tree from which distances are computed. Must be 'language' or 'dialect'. 'language' corresponds to level 15, or 'dialect' corresponds to level 16.

delta

Delta parameter used to discount short distances on the language tree. Affect the links returned. See LEDA$ling_distance(). for details.

expand

Expand the language tree so that all languages are located on level 15 or not. If FALSE languages are located on their original position in the linguistic tree, which can be considerably closer to the root of the tree. Defaults to FALSE for reasons explained in the paper.

agg_fun.a

Function used aggregate linguistic distances across the nodes associated with group a to group b in the (common) cases where a is associated with multiple language nodes. Defaults to min.

agg_fun.b

Function used aggregate linguistic distances across the nodes associated with group b to each node associated with group a. in the (common) cases where b is associated with multiple language nodes. Defaults to min.

add_listmetadata

Adds metadate of lists A and B to the output. Defaults to TRUE.

Returns

A DataFrame. Columns include the names and identifiers of groups a and b. Column distance stores the linguistic distance between groups a and b.

Examples
# Initialize linkage object
leda.obj <- LEDA$new()

# link Afrobarometer to FRT
link.withindist <- leda.obj$link_withinlingdist(
  lists.a = list(type = c("Afrobarometer"),  iso3c = "UGA"), 
  lists.b = list(type = c("FRT"), iso3c = "UGA"),
  level = "dialect",
  max.distance = .1, by.country = TRUE,
  delta = .5, expand = FALSE, 
  agg_fun.a = mean, agg_fun.b = min)
head(link.withindist)

Method link_minlingdist()

Returns a link table of ethnic groups contained in lists A and B. Each for each list of ethnic groups in A and B, each group a is linked to its closest linguistic neighbour b. Note that group a can be linked to several groups b if they are equidistant to a. Links are provided between all lists in A with every list in B separately.

The returned DataFrame contains at least one row per group a that has been linked to the ethnologue language tree. If a is not linked to any group b, the columns that contains linked groups b are set to missing. The returned DataFrame contains multiple rows per group a if a is linked to multiple groups b.

Usage
LEDA$link_minlingdist(
  lists.a,
  lists.b,
  level = c("dialect", "language"),
  by.country = T,
  expand = FALSE,
  delta = 0.5,
  agg_fun.a = min,
  agg_fun.b = min,
  add_listmetadata = T
)
Arguments
lists.a

Vector of lists A, identified via their list.id returned by method LEDA$get_list_ids().. Or a list of parameters that specify lists A. See LEDA$show_list_parameters().

lists.b

Vector of lists B, identified via their list.id returned by method LEDA$get_list_ids().. Or a list of parameters that specify lists B. See LEDA$show_list_parameters().

level

Level on the linguistic tree from which distances are computed. Must be 'language' or 'dialect'. 'language' corresponds to level 15, or 'dialect' corresponds to level 16.

by.country

Flag for linking lists only within the same country (by.country = TRUE), or also across countries (by.country = FALSE). Defaults to TRUE to avoid accidental computation of a huge number of links.

expand

Expand the language tree so that all languages are located on level 15 or not. If FALSE languages are located on their original position in the linguistic tree, which can be considerably closer to the root of the tree. Defaults to FALSE for reasons explained in the paper.

delta

Delta parameter used to discount short distances on the language tree. Does not affect the links, only the absolute linguistic distance associated with them (but not their rank). See LEDA$ling_distance(). for details.

agg_fun.a

Function used aggregate linguistic distances across the nodes associated with group a to group b in the (common) cases where a is associated with multiple language nodes. Defaults to min.

agg_fun.b

Function used aggregate linguistic distances across the nodes associated with group b to each node associated with group a. in the (common) cases where b is associated with multiple language nodes. Defaults to min.

add_listmetadata

Adds metadate of lists A and B to the output. Defaults to TRUE.

Returns

A DataFrame. Columns include the names and identifiers of groups a and b. Column distance stores the linguistic distance between groups a and b.

Examples
# Initialize linkage object
leda.obj <- LEDA$new()

# link Afrobarometer to FRT
link.mindist <- leda.obj$link_minlingdist(
 lists.a = list(type = c("Afrobarometer"), iso3c = "UGA"), 
 lists.b = list(type = c("FRT"),  iso3c = "UGA"),
 level = "dialect",
 by.country = TRUE, expand = FALSE,  delta = .5,
 agg_fun.a = mean, agg_fun.b = min)
 
head(link.mindist)

Method get_raw_ethnolinks()

Retrieve the raw coding of links between ethnic groups contained in group lists specified by param_list to the language tree. The function returns the raw data the LEDA object is based upon.

Usage
LEDA$get_raw_ethnolinks(param_list)
Arguments
param_list

List of parameter values to subset lists of ethnic groups. The following fields are allowed:

cowcode: Correlates of War code of country

iso3c: 3-letter isocode of country

type: Type of ethnic group dataset. One of: c("AMAR", "DHS", "SIDE", "EPR", "Fearon", "FRT", "GREG", "Murdock_Map", "IPUMS", "Afrobarometer", "WLMS", "PREG")

marker: Ethnic marker used in list. "ethnic group": Ethnic group / ethnicity. "language": Language. "mtongue": Mother tongue.

groupvar Variable name of ethnic group identifier in original dataset.

round: Round of survey (DHS; SIDE; Afrobarometer)

subround: Subround of survey (DHS; SIDE)

year: Year (EPR; IPUMS)

list.id: ID of list of ethnic groups.

These are parameters are also returned by method LEDA$show_list_parameters().

Examples
# Initialize linkage object
leda <- LEDA$new()

# Get list parameters
leda$get_raw_ethnolinks(param_list = 
   list(type = "Afrobarometer", iso3c = "UGA"))

Method get_types()

Get types of group lists in LEDA object

Usage
LEDA$get_types()
Examples
# Initialize linkage object
leda <- LEDA$new()

# Get list parameters
leda$get_types()

Method prepare_newlink_table()

Based on an input of ethnic group names, the function returns a link table between the groups and automatically found likely matches on the language tree. These non-authorative 'suggestions' are identified via a fuzzy string match of input group names with (1) the names of nodes on the langauge tree, including dialects and alternative names, as well as (2) the names of groups that have been previously linked to the tree as contained in the respective LEDA object.

Usage
LEDA$prepare_newlink_table(
  group.df,
  groupvar,
  by.country = FALSE,
  return = TRUE,
  save.path = NULL,
  overwrite = FALSE,
  prev_link_param_list = NULL,
  levenshtein.threshold = 0.2,
  levenshtein.costs = c(insertions = 1, deletions = 1, substitutions = 1)
)
Arguments
group.df

A DataFrame that contains the names of ethnic groups to be linked to the language tree, as well as any other (meta)data whished to retain.

groupvar

String containing the name of the column in group.df that contains the names of ethnic groups.

by.country

Logical determining whether ethnic groups names should be string matched separately within and outside the country they belong. Setting the parameter TRUE leads to more plausible matches, but requires a column iso3c in group.df. Column iso3c has to contain valid 3-letter iso codes of African countries. See e.g. the R-package countrycode.

return

Logical determining whether the reulting link table shall be returned.

save.path

String of the path to which the reulting link table is stored, as a .csv file. If NULL (the default), nothing is stored.

overwrite

Logical determining whether a previously existing file located by save.path is overwritten.

prev_link_param_list

Parameters that determine the subset of previous links between ethnic group lists and the language tree to automatically retrieve link suggestions from. If NULL (the default), all available lists are used. See LEDA$show_list_parameters(). for details.

levenshtein.threshold

Threshold Levenshtein string distance below which a fuzzy string match is returned.

levenshtein.costs

Vector of costs used to compute Levenshtein string distance. See utils::adist for details.

Details

The automatic links should not be regarded as authorative, but merely as a help to facilitate the coding, which proceeds outside the LEDA environment from this point onwards. The final table returned by the function has a column called link, which is to be filled by the user, using the automatically generated suggestions and secondary data sources. Once this coding is completed, the table can be added to the LEDA object with the method LEDA$add_tree_links()..

Returns

A DataFrame of the same height as group.df. In addition to the columns of group.df, it contains the following columns:

auto_link_org: Language tree nodes (levels 1:14; languages, i.e. level 15) matched via their original (org) name.

auto_link_alt: Languages (level 15) matched via their alternative name.

auto_link_dial: Dialects matched via their name or alternative name.

auto_link_prev: Language tree nodes found via a fuzzy string match of input groups to the groups previously linked to the language tree (potentially subsetted by prev_link_param_list).

auto_link_foreign: If by.country, same four fields as above for langauges from 'foreign' countries, but pasted into one single string.

link Empty column for the final link, to be filled by the user.

comment Empty column for comments on the final link, to be filled by the user.

source Empty column for the source of the final link, to be filled by the user.

Multiple matches are combined by pasting the languague names separated by a '|'. The matched language nodes' tree level is indicated behind its name in '[]', with L1:L14 denoting super-language levels, 'lang' denoting languages, and 'dial' denoting dialects. This coding format should be maintained when filling the column link with the final link of groups to the language tree.

Examples
# Initialize linkage object
leda <- LEDA$new()

# Make or load some dataset of ethnic groups
new.groups.df <- data.frame(group_name = 
  c("Asante", "Grusi", "Akan"),iso3c = c("GHA"),
  type = "My Survey in Ghana",
  marker = "ethnic group",
  stringsAsFactors = FALSE)


# Prepare a new link table 
newlink.df <- leda$prepare_newlink_table(
  group.df = new.groups.df, 
  groupvar = "group_name",
  by.country = TRUE, 
  return = TRUE, save.path =  NULL, 
  overwrite = TRUE, prev_link_param_list = NULL,
  levenshtein.threshold = .2,
  levenshtein.costs = 
  c(insertions = 1,deletions = 1, substitutions = 1))



Method add_tree_links()

Function to add a table that links a list of ethnic groups with nodes on the language tree to the LEDA object.

Usage
LEDA$add_tree_links(tree.link.df, idvars, type)
Arguments
tree.link.df

Table that contains te links between ethnic groups and the langauge tree. It must contain the following variables:

group Names of ethnic groups, of type "character".

link Names of language nodes linked to ethnic group, of type "character". Multiple nodes linked to the same ethnic group must be separated by a '|'. Names of language nodes must be contained in the Ethnologue database 13. A node's level is specified in [], the node must exist on that level. Level specifiers follow this form: L1:L14 denoting super-language levels, 'lang' denoting languages, 'dial' denoting dialects, and iso denoting language ISO-codes. E.g. "Akan [L9]", "Asante [dial]", or "aka [iso]". If no level is specified and multiple langauge tree nodes share the same name, the one closest to the node is chosen. Note that the safest way to avoid confusion is to provide iso-codes.

iso3c (not required) Country identifier as 3-letter ISO code. If provided, the algorithm gives preference to nodes of the language tree in the same country in cases where multiple nodes share the same name given by link.

idvars Additional variables that identify lists of ethnic groups in your data. See below.

idvars

Variables that identify lists of ethnic groups in your data. These should contain a list type (e.g. "My survey"), and be typically nested within countries and years. See LEDA$show_list_parameters(). for the variables that identify ethnic group lists in the LEDA dataset.

The unique combinations of idvars values in group.df are used to create new entries in the LEDA object's dictionary of group lists leda.obj$list.dict.

type

String that contains the 'type' of the group list to add, for example 'My survey'. Must not be one of the types already in the LEDA project.

Details

The function links the input ethnic groups to the linguistic tree contained in the package and thereby updates the LEDA object. Once this has been done, the added group can be linked to all ethnic group lists contained in the LEDA object. See examples below.

Examples
# Initialize linkage object
leda <- LEDA$new()

# Make toy link dataset
new.groups.df <- data.frame(
   group = c("Asante", "Mossi"), ## Ethnic group names
   link = c("Asante [dial]","Moore [org]"), ## Language nodes
   marker = "Ethnic self identification",
   iso3c = c("GHA", "BFA"), ## Countries
   stringsAsFactors = FALSE ## Everything as character
   )

# Add to LEDA
leda$add_tree_links(tree.link.df = new.groups.df, 
       idvars = c("iso3c", "type", "marker"),
       type = "My data")
       
# Use the new link
setlink <- leda$link_set(lists.a = list(type = c("My data")), 
    lists.b = list(type = c("Afrobarometer"), 
    round = 4, marker = "language",
    iso3c = c("GHA","BFA")), 
    link.level = 15, by.country = FALSE, 
    drop.b.threshold = 0, drop.ethno.id = TRUE)
    
head(setlink[, c("a.group", "b.group", "a.type", "b.type", "a.list.id", "b.list.id")])

Method clone()

The objects of this class are cloneable with this method.

Usage
LEDA$clone(deep = FALSE)
Arguments
deep

Whether to make a deep clone.

Examples


# Initialize linkage object
leda.obj <- LEDA$new()

# link Afrobarometer to FRT

## Based on set relation
setlink <- leda.obj$link_set(
  lists.a = list(type = c("Afrobarometer"), 
  iso3c = "UGA"), 
  lists.b = list(type = c("FRT"),iso3c = "UGA"), 
  link.level = "dialect",   by.country = TRUE, 
  drop.b.threshold = 0, drop.ethno.id = TRUE)
head(setlink[,c("a.group","b.group")])

## Nearest linguistic neighbor
mindistlink <- leda.obj$link_minlingdist(
 lists.a = list(type = c("Afrobarometer"), iso3c = "UGA"), 
 lists.b = list(type = c("FRT"),  iso3c = "UGA"),
 level = "dialect",
 by.country = TRUE, expand = FALSE,  delta = .5,
 agg_fun.a = mean, agg_fun.b = min)
head(mindistlink[,c("a.group","b.group")])

## Within maximum linguistic distance
withindistlink <- leda.obj$link_withinlingdist(
  lists.a = list(type = c("Afrobarometer"),  iso3c = "UGA"), 
  lists.b = list(type = c("FRT"), iso3c = "UGA"),
  level = "dialect",
  max.distance = .1, by.country = TRUE,
  delta = .5, expand = FALSE, 
  agg_fun.a = mean, agg_fun.b = min)
head(withindistlink[,c("a.group","b.group")])

## Compute pairwise linguistic distance
distance.df <- leda.obj$ling_distance(
   lists.a = list(type = c("Afrobarometer"),  iso3c = "UGA"),
   lists.b = list(type = c("FRT"), iso3c = "UGA"),
   level = "dialect",
   by.country = TRUE, delta = .5, 
   expand = FALSE,  
   agg_fun.a = mean, agg_fun.b = min)
head(distance.df[,c("a.group","b.group")])

## ------------------------------------------------
## Method `LEDA$new`
## ------------------------------------------------

library(LEDA)
leda.obj <- LEDA$new()


## ------------------------------------------------
## Method `LEDA$show_list_parameters`
## ------------------------------------------------

# Initialize linkage object
leda <- LEDA$new()

# Get list parameters
leda$show_list_parameters()

## ------------------------------------------------
## Method `LEDA$get_list_dict`
## ------------------------------------------------

# Initialize linkage object
leda <- LEDA$new()

# Get list dictionaries
list.dict <- leda$get_list_dict()
head(list.dict)

## ------------------------------------------------
## Method `LEDA$get_list_dict_subset`
## ------------------------------------------------

# Initialize linkage object
leda <- LEDA$new()

# Get list data for Afrobarometers in Uganda
leda$get_list_dict_subset(param_list = 
     list(type = "Afrobarometer", iso3c = c("UGA","KEN")))

## ------------------------------------------------
## Method `LEDA$get_list_ids`
## ------------------------------------------------

# Initialize linkage object
leda <- LEDA$new()

# Get list IDs for Afrobarometers in Uganda
leda$get_list_ids(param_list = 
   list(type = "Afrobarometer", iso3c = c("UGA","KEN")))

## ------------------------------------------------
## Method `LEDA$link_set`
## ------------------------------------------------

# Initialize linkage object
leda.obj <- LEDA$new()

# link Afrobarometer to FRT
setlink <- leda.obj$link_set(lists.a = list(type = "Afrobarometer", 
                                          iso3c = c("UGA","KEN")),
                           lists.b = list(type = "FRT", 
                                          iso3c = c("UGA","KEN")), 
                           link.level = "dialect",  
                           by.country = TRUE, 
                           drop.a.threshold = 0, 
                           drop.b.threshold = 0, 
                           drop.ethno.id = TRUE, 
                           add_listmetadata = TRUE)
head(setlink)

## ------------------------------------------------
## Method `LEDA$ling_distance`
## ------------------------------------------------

# Initialize linkage object
leda.obj <- LEDA$new()

# link Afrobarometer to FRT
ling.distance <- leda.obj$ling_distance(
   lists.a = list(type = c("Afrobarometer"),  iso3c = "UGA"),
   lists.b = list(type = c("FRT"), iso3c = "UGA"),
   level = "dialect",
   by.country = TRUE, delta = .5, 
   expand = FALSE,  
   agg_fun.a = mean, agg_fun.b = min)
head(ling.distance)

## ------------------------------------------------
## Method `LEDA$link_withinlingdist`
## ------------------------------------------------

# Initialize linkage object
leda.obj <- LEDA$new()

# link Afrobarometer to FRT
link.withindist <- leda.obj$link_withinlingdist(
  lists.a = list(type = c("Afrobarometer"),  iso3c = "UGA"), 
  lists.b = list(type = c("FRT"), iso3c = "UGA"),
  level = "dialect",
  max.distance = .1, by.country = TRUE,
  delta = .5, expand = FALSE, 
  agg_fun.a = mean, agg_fun.b = min)
head(link.withindist)

## ------------------------------------------------
## Method `LEDA$link_minlingdist`
## ------------------------------------------------

# Initialize linkage object
leda.obj <- LEDA$new()

# link Afrobarometer to FRT
link.mindist <- leda.obj$link_minlingdist(
 lists.a = list(type = c("Afrobarometer"), iso3c = "UGA"), 
 lists.b = list(type = c("FRT"),  iso3c = "UGA"),
 level = "dialect",
 by.country = TRUE, expand = FALSE,  delta = .5,
 agg_fun.a = mean, agg_fun.b = min)
 
head(link.mindist)

## ------------------------------------------------
## Method `LEDA$get_raw_ethnolinks`
## ------------------------------------------------

# Initialize linkage object
leda <- LEDA$new()

# Get list parameters
leda$get_raw_ethnolinks(param_list = 
   list(type = "Afrobarometer", iso3c = "UGA"))

## ------------------------------------------------
## Method `LEDA$get_types`
## ------------------------------------------------

# Initialize linkage object
leda <- LEDA$new()

# Get list parameters
leda$get_types()

## ------------------------------------------------
## Method `LEDA$prepare_newlink_table`
## ------------------------------------------------

# Initialize linkage object
leda <- LEDA$new()

# Make or load some dataset of ethnic groups
new.groups.df <- data.frame(group_name = 
  c("Asante", "Grusi", "Akan"),iso3c = c("GHA"),
  type = "My Survey in Ghana",
  marker = "ethnic group",
  stringsAsFactors = FALSE)


# Prepare a new link table 
newlink.df <- leda$prepare_newlink_table(
  group.df = new.groups.df, 
  groupvar = "group_name",
  by.country = TRUE, 
  return = TRUE, save.path =  NULL, 
  overwrite = TRUE, prev_link_param_list = NULL,
  levenshtein.threshold = .2,
  levenshtein.costs = 
  c(insertions = 1,deletions = 1, substitutions = 1))



## ------------------------------------------------
## Method `LEDA$add_tree_links`
## ------------------------------------------------

# Initialize linkage object
leda <- LEDA$new()

# Make toy link dataset
new.groups.df <- data.frame(
   group = c("Asante", "Mossi"), ## Ethnic group names
   link = c("Asante [dial]","Moore [org]"), ## Language nodes
   marker = "Ethnic self identification",
   iso3c = c("GHA", "BFA"), ## Countries
   stringsAsFactors = FALSE ## Everything as character
   )

# Add to LEDA
leda$add_tree_links(tree.link.df = new.groups.df, 
       idvars = c("iso3c", "type", "marker"),
       type = "My data")
       
# Use the new link
setlink <- leda$link_set(lists.a = list(type = c("My data")), 
    lists.b = list(type = c("Afrobarometer"), 
    round = 4, marker = "language",
    iso3c = c("GHA","BFA")), 
    link.level = 15, by.country = FALSE, 
    drop.b.threshold = 0, drop.ethno.id = TRUE)
    
head(setlink[, c("a.group", "b.group", "a.type", "b.type", "a.list.id", "b.list.id")])

carl-mc/LEDA documentation built on June 12, 2024, 9:22 p.m.