knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) library(tidyverse)
The langdata
package provides a set of linguistic datasets useful for practicing visualization, transformation and analysis techniques on language datasets. The data is compiled from freely available online sources and curated using the 'tidy' dataset principle.
This package is available as a repository on GitHub. To install the package from GitHub you will need the devtools
package.
install.packages("devtools")
Then install langdata
.
devtools::install_github("francojc/langdata")
You can now load the package to make the datasets available.
library(langdata)
The current list of datasets:
data(package = "langdata")$results %>% as_tibble() %>% knitr::kable()
To load a particular dataset use the data()
function.
data(swda) # load the `swda` dataset ls() # verify the dataset is now in the environment
To get a description of a dataset use the ?
operator for help.
?swda
To aid in visualization and transformation operations, it is recommended that you use the tidyverse
meta package. Loading the tidyverse
package library will automatically load the following packages:
library(tidyverse) # load tidyverse tidyverse_packages() # list tidyverse packages
To view a summary of the data use the glimpse()
function.
glimpse(swda)
We can explore some of the demographic information for the speakers using the group_by()
function to group the data and then pass this information to the count()
function to return the number of rows in each group.
swda %>% group_by(sex) %>% count()
swda %>% group_by(birth_year) %>% count(sort = TRUE)
You can add multiple grouping variables to group_by()
to do cross-tabulations.
swda %>% group_by(sex, dialect_area) %>% count()
...
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.