knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
library(tidyverse)

The langdata package provides a set of linguistic datasets useful for practicing visualization, transformation and analysis techniques on language datasets. The data is compiled from freely available online sources and curated using the 'tidy' dataset principle.

Installation

This package is available as a repository on GitHub. To install the package from GitHub you will need the devtools package.

install.packages("devtools")

Then install langdata.

devtools::install_github("francojc/langdata")

Load the package library

You can now load the package to make the datasets available.

library(langdata)

The current list of datasets:

data(package = "langdata")$results %>% 
  as_tibble() %>% 
  knitr::kable()

To load a particular dataset use the data() function.

data(swda) # load the `swda` dataset
ls() # verify the dataset is now in the environment

To get a description of a dataset use the ? operator for help.

?swda

To aid in visualization and transformation operations, it is recommended that you use the tidyverse meta package. Loading the tidyverse package library will automatically load the following packages:

library(tidyverse) # load tidyverse
tidyverse_packages() # list tidyverse packages

Example

To view a summary of the data use the glimpse() function.

glimpse(swda)

We can explore some of the demographic information for the speakers using the group_by() function to group the data and then pass this information to the count() function to return the number of rows in each group.

swda %>% 
  group_by(sex) %>% 
  count()
swda %>% 
  group_by(birth_year) %>% 
  count(sort = TRUE)

You can add multiple grouping variables to group_by() to do cross-tabulations.

swda %>% 
  group_by(sex, dialect_area) %>% 
  count()

...



francojc/langdata documentation built on May 31, 2019, 2:48 p.m.