This vignette demonstrates how to use the main functions in the netlangr
package.
The makelangnet
and makemultinet
functions are used to create a language network from a list of words.
The getnetstats
and getmultistats
functions are used to compute the network measures from a language network.
The makelangnet
and getnetstats
functions are for single-layer networks, i.e., either phonological or orthographic similarity networks.
The makemultinet
and getmultistats
functions are for multi-layer networks, i.e., the phono-graphic multiplex where both phonological and orthographic links are represented in the network.
The package is still very much a work in progress, so any feedback, comments, and suggestions are very welcome!
Download the package from my github page.
# install.packages('devtools') # library(devtools) # install_github('csqsiew/netlangr') library(netlangr)
All you really need to get started is a list of words. Spellings if you want to construct an orthographic network, and phonological transcriptions if a phonological network is desired. Note that the phonological transcriptions must be constructed such that 1 phoneme = 1 character, this is because the networks are constructed based on edit distance of 1 (i.e., a link is placed between pairs of words that differ by the substitution, deletion, or addition of one phoneme/letter, which is the way that phonological or orthographic similarity is typically operationalized in the psycholinguistic literature; Luce & Pisoni, 1998; Coltheart et al., 1977).
data <- read.csv('cat.csv', stringsAsFactors = F) # stringsasFactor = F to force the columns to character class, instead of factor head(data) class(data$Phono) # should be 'character' class(data$Ortho) # should be 'character'
# Phonological network phono.net <- makelangnet(data$Phono) # make the language network phono.net.measures <- getnetstats(phono.net) # get network measures head(phono.net.measures) # Orthographic network ortho.net <- makelangnet(data$Ortho) # make the language network ortho.net.measures <- getnetstats(ortho.net) # get network measures head(ortho.net.measures)
Network measures:
location: G = largest connected component (giant component), L = lexical island, H = hermit
degree: number of words that are neighbors of a given word (i.e., neighborhood size)
clustering: the extent to which a word's neighbors are also neighbors of each other, i.e., clustering in a word's neighborhood in the network, ranges from 0 to 1.
closeness.gc: normalized inverse of the average distance between a given word and all other words in the LCC, higher values indicate that a word is close to many other words in the network (more central). (Note that closeness centrality is only calculated for words in the LCC.)
multi.net <- makemultinet(data) multi.net.measures <- getmultistats(multi.net) head(multi.net.measures) # write.csv(multi.net.measures, file = 'output.csv') # export the data if you wish
Network measures:
location: G = largest connected component (giant component), L = lexical island, H = hermit
degree.pg: number of words that are both phonologial AND orthographic neighbors of a given word (i.e., phonographic neighbors; the neighborhood size of the phonographic network)
degree.all: number of words that are phonological or orthographic neighbors of a given word (note that phonographic neighbors are not double counted, i.e., the neighborhood size of the phonographic multiplex)
clustering.pg: the extent to which a word's phonographic neighbors are also phonographic neighbors of each other, i.e., clustering in a word's neighborhood in the phonographic network, ranges from 0 to 1.
clustering.unweighted: the extent to which a word's phonological and orthographic neighbors are phonological or orthographic neighbors of each other, i.e., clustering in a word's neighborhood in the phonographic multiplex, ranges from 0 to 1. unweighted = each link has the same weight.
clustering.weighted: the extent to which a word's phonological and orthographic neighbors are phonological or orthographic neighbors of each other, i.e., clustering in a word's neighborhood in the phonographic multiplex, ranges from 0 to 1. weighted = phonographic links are double weighted as compared to phonological or orthogrpahic only links.
closeness.gc.unweighted: normalized inverse of the average distance between a given word and all other words in the LCC, higher values indicate that a word is close to many other words in the network (more central). unweighted = each link has the same weight.
closeness.gc.weighted: normalized inverse of the average distance between a given word and all other words in the LCC, higher values indicate that a word is close to many other words in the network (more central). weighted = phonographic links are double weighted as compared to phonological or orthogrpahic only links. (Note that closeness centrality is only calculated for words in the LCC.)
library(igraph) l <- layout_with_lgl(multi.net) # color edges by their connection type E(multi.net)$color <- E(multi.net)$type E(multi.net)$color <- E(multi.net)$color %>% gsub('po', 'green', .) %>% gsub('o', 'blue', .) %>% gsub('p', 'red', .) plot(multi.net, vertex.label.color = 'black', vertex.color = 'white', vertex.label.family = 'Helvetica', layout = l, edge.color = E(multi.net)$color, vertex.label.cex = .7, vertex.shape="none", vertex.label=V(multi.net)$label, main = 'Phonographic network of CAT') legend(x=-1.5, y=-1.1, c("phonographic","orthographic", "phonological"), pch=21, col="#777777", pt.bg=c('green', 'blue', 'red'), pt.cex=2, cex=.8, bty="n", ncol=1)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.