glottolog: Glottolog data from <URL: http://www.glottolog.org>

Description Usage Format Details Source Examples

Description

Data from Glottolog 2016 with added WALS codes and speaker-community size. Various minor corrections and additions were performed in the preparation of the data (see Details). All stocks (i.e. largest reconstructable units) are linked to macroareas, and they are linked to a single root node calles 'World'.

Usage

1
data("glottolog")

Format

A data frame with 22007 observations on the following 10 variables.

name

a character vector with the name of the entity.

father

a character vector with the name of the direct parent entity.

stock

a factor with the highest reconstructable unit. This column is added just for convenience, it does not add any new information.

glottocode

a character vector with the glottocode. The same identifier is added as rownames of the data.

iso

a character vector with ISO 639-3 language codes

wals

a character vector with WALS language codes

level

a factor with levels dialect, family and language

longitude

a numeric vector with geographic coordinates as available in the Glottolog

latitude

a numeric vector with geographic coordinates as available in the Glottolog

population

a numeric vector with speaker community size from an old Ethnologue version (13th Edition), licensed to the MPI-EVA in Leipzig.

Details

For Glottolog data: the names were uniquified by adding a glottocode when a name occurs more than once (typically in some cases of a language and a family having the same name). Entries classified as 'bookkeeping', 'unattested' 'artificial language', 'sign language', 'speech register' and 'unclassifiable' were removed. Links to WALS codes were added: note that about 20 links are missing, and for the non-unique links one link was chosen by data availability. Some macro codes from ISO 639-3 were added.

A level 'area' was added to the tree, separating all languages in six areas: Eurasia, Africa, Southeast Asia, Sahul, North America and South America. This is reminiscent of the proposal from Dryer (1992), though Austronesian is grouped with Southeast Asia here, because that makes more sense genealogically. Still, these nodes are surely not monophyletic! Mixed languages are not assigned to an area.

Please note that the data provided here is not identical to the online version of Glottolog, as the online version is constantly being updated! This is Glottolog 2016. Updates might be made available when they are provided for download from the website.

The format of the glottolog data might seem a bit convoluted, but by using getTree it is actually really easy to extract genealogical parts of the glottolog data and by using FromDataFrameNetwork this can be nicely plotted and turned into various tree format as used in R.

Source

Glottolog 2016 data from http://www.glottolog.org. WALS 2013 data from http://www.glottolog.org. Information on macrolanguages from http://www-01.sil.org/iso639-3/macrolanguages.asp. All data downloaded in March 2017. Population numbers are from the 13th edition of the Ethnologue, licenced to the MPI-EVA in Leipzig.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
# use getTree() to select genealogical parts of the data
data(glottolog)

( aalawa <- getTree(up = "aala1237", kind = "glottocode") )
( kandas <- getTree(down = "Kandas-Duke of York") )
( treeFull <- getTree(up = c("deu", "eng", "ind", "cha"), kind = "iso") )
( treeReduced <- getTree(up = c("deu", "eng", "ind", "cha"), kind = "iso", reduce = TRUE) )

# check out areas
( areas <- glottolog[glottolog$level == "area", "name"] )
# stocks in Southeast Asia
glottolog[glottolog$father == areas[1], "name"]

## Not run: 
# use FromDataFrameNetwork() to visualize the tree
# and export it into various other tree formats in R

library(data.tree)
treeF <- FromDataFrameNetwork(treeFull)
treeR <- FromDataFrameNetwork(treeReduced)

plot(treeF)
plot(treeR)

# turn into phylo format from library 'ape'
t <- as.phylo.Node(treeR)
plot(t)

# turn into dendrogram
t <- as.dendrogram(treeF)
plot(t, center = T)

## End(Not run)

cysouw/qlcData documentation built on Dec. 26, 2021, 6:13 p.m.