Home

/

GitHub

/

In WFU-TLC/analyzr: Support Package for Data Analysis with R (FLC)

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

Goals

In this vignette I will provide an overview of some of the more common strategies that you will use to manipulate and organize your data for subsequente analysis. We will be working with two packages that are part of the tidyverse package. The first, tidyr, provides a number of functions for reorganizing variables between long and wide format as well as separating out new variables based on the values of other variable. The second, dplyr, is used for manipulating data, that is, to select, filter, sort, etc. and for transforming values either through recoding or some other operation.

Data

Let's take at a dataset included in the analyzr package. First, install and load the package, and the main tidyverse tools.

devtools::install_github("WFU-TLC/analyzr")

library(tidyverse)
library(analyzr)

Let's take a look at the sdac dataset.

glimpse(sdac)

This dataset is in the tidy format. Take a look at the R documentation for this dataset with ?sdac.

Manipulate data frames

There are a few tidyverse verbs that are very commonly used to manipulate data frames.

select() allows you to select a subset of columns

sdac %>% 
  select(speaker_id, damsl_tag, birth_year, utterance_text) %>% 
  head()

arrange() sorts a data frame by one or more columns

sdac %>% 
  select(speaker_id, damsl_tag, birth_year, utterance_text) %>%
  arrange(birth_year) %>% 
  head()

filter() allows you to select rows where the values match certain parameters

sdac %>% 
  select(speaker_id, damsl_tag, birth_year, utterance_text) %>%
  arrange(birth_year) %>% 
  filter(birth_year == 1971) %>% 
  head()

filter() can be combined with numerous operators and vector functions.

sdac %>% 
  select(speaker_id, damsl_tag, birth_year, utterance_text) %>%
  arrange(birth_year) %>% 
  filter(between(birth_year, 1950, 1969)) %>% 
  head()

sdac %>% 
  select(speaker_id, damsl_tag, birth_year, utterance_text) %>%
  arrange(birth_year) %>% 
  filter(birth_year > 1955) %>% 
  head()

Summarize data

You often want to explore your data by summarizing. A basic summary is count().

sdac %>% 
  count()

You can also add column names to count() to group your count summary.

sdac %>% 
  count(birth_year, sort = TRUE)

You can also use the group_by() function to expliciy group your data for multiple operations.

sdac %>% 
  group_by(birth_year) %>% 
  count()

Using group_by() we can sample data as well.

sdac %>% 
  group_by(birth_year) %>% 
  sample_n(2) %>% 
  select(speaker_id, birth_year, utterance_text) %>% 
  arrange(birth_year) %>% 
  head()

mutate
summarize
Vector functions
n
row_number
case_when

Organize data frames

knitr::include_graphics(path = "http://www.sthda.com/sthda/RDoc/images/tidyr.png")

gather/ spread
separate/ unite
Two table verbs

WFU-TLC/analyzr documentation built on June 4, 2019, 2:27 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

WFU-TLC/analyzr
Support Package for Data Analysis with R (FLC)

In WFU-TLC/analyzr: Support Package for Data Analysis with R (FLC)

Goals

Data

Manipulate data frames

Summarize data

Organize data frames

R Package Documentation

Browse R Packages

We want your feedback!

WFU-TLC/analyzr Support Package for Data Analysis with R (FLC)

In WFU-TLC/analyzr: Support Package for Data Analysis with R (FLC)

Goals

Data

Manipulate data frames

Summarize data

Organize data frames

R Package Documentation

Browse R Packages

We want your feedback!

WFU-TLC/analyzr
Support Package for Data Analysis with R (FLC)