knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

Philosophy

There are common steps that you may face in your data exploration:

This package aims to facilitate this basic data handling with functions that I've found handy.

Quick Reference Table

| handy Function | Use Case | Description | |-------------------|--------------|----------------------| | clean_names | Clean Data | Normalize variable names of a data.frame to snake_case | | title_names | Clean Data | Normalize variable names of a data.frame to Title Case | | remove_columns | Clean Data | Removes columns that are either missing or non-unique |
| add_mean_row | Describe Data | Add a mean row to a data.frame of numeric values | | add_total_row | Describe Data | Add a total row to a data.frame of numeric values | | check_variables | Describe Data | Calculate % missing and number of unique values for all variables | | %p% | Shorthand | Allow string concatenation by piping | | cs | Shorthand | Makes a character vector without quotes | | named_list | Shorthand | Makes a named list using object names | | multiplot | Plotting | Arrange multiple plots | | scale_x_human | Plotting | Scales ggplot axis to easily readable K, M, B, etc. | | scale_y_human | Plotting | Scales ggplot axis to easily readable K, M, B, etc. | | write_excel | Write Output | Write a list of data.frames in R to named sheets in an Excel workbook | | write_regression_to_excel | Write Output | Write a list of data.frames in R to named sheets in an Excel workbook |

Clean

Let's use R's built in iris dataset.

These cleaning functions are built to be compatible with magrittr/dplyr pipes.

library(handy)
library(magrittr)
library(knitr)
library(ggplot2)
iris3obs <- head(iris, 3) 
iris3obs %>% kable()
iris3obs %>% clean_names %>% kable()

What if we want to focus on columns that change for the set we're looking for?

iris3obs %>% remove_columns(1) %>% kable()

Describe

For every dataset, we want to know what it contains:

mtcars %>% check_variables %>% kable()

Sometimes it's helpful to add a mean or total row for a table that's to be printed out. These are best for datasets with a character key as the first column.

arrests <- USArrests %>% tibble::rownames_to_column() %>% dplyr::as_tibble()
arrests %>% head() %>% add_mean_row() %>% kable()
arrests %>% head() %>% add_total_row() %>% kable()

Plot

p1 <- Seatbelts %>% tibble::as_tibble() %>% 
  ggplot(aes(kms, DriversKilled)) +
  geom_point() +
  scale_x_human() 
p2 <- Seatbelts %>% tibble::as_tibble() %>% 
  ggplot(aes(DriversKilled)) +
  geom_histogram(alpha = .5, bins = 30)
multiplot(p1, p2)

Write

And for some analyses, it's nice to write out the final data to Excel.

named_list(mtcars, iris) %>% 
  write_excel(workbook_fname = 'wb.xlsx')

readxl::excel_sheets('wb.xlsx')
readxl::read_excel('wb.xlsx', sheet = 'iris') %>% head %>% kable()

Here, using named_list assembles a list with the list names set to the names of the objects.

lm_assault <- lm(Assault ~ UrbanPop, data = arrests)
lm_anscombe <- lm(y1 ~ x1, data = anscombe)
reg_list <- named_list(lm_assault, lm_anscombe)

reg_list
reg_list %>% write_regression_to_excel('reg.xlsx')
readxl::read_excel('reg.xlsx', sheet = 'lm_anscombe') %>% head %>% kable

Other

What do we have in memory at this point?

lsos(n = 5) %>% kable()


KevinRPan/handy documentation built on May 4, 2019, 2:34 p.m.