knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "README-"
)

sundry

Travis-CI Build Status AppVeyor Build Status codecov

The sundry package is a personal R package filled with functions that make my life a little easier when working on day-to-day analyses. Most of the functions in the package are designed to be friendly with the tidyverse and thus are pipe friendly (%>%), and work with other functions like dplyr::group_by. The package is somewhat perpetually under development.

Installation

You can install sundry from github with:

# install.packages("devtools")
devtools::install_github("datalorax/sundry")

Examples

Below are a few examples of functions in the package.

Batch import data and bind into a single data frame

Maybe my favorite function in the package is the read_files function, which will read in n datasets and, if possible bind them together into a single tibble (data frame). The function uses rio::import, which makes it really nice because you don't have to worry about file types basically at all, and you can even read in data of different types all at once.

library(sundry)
library(tidyverse)
by_species <- iris %>%
  split(.$Species) %>%
  map(select, -Species)

str(by_species)

# export as three different file types
rio::export(by_species$setosa, "setosa.csv")
rio::export(by_species$versicolor, "versicolor.xlsx")
rio::export(by_species$virginica, "virginica.sav")

# import them all back in as a single data frame
d <- read_files()
d
d %>% 
  count(file)

fs::file_delete(c("setosa.csv", "versicolor.xlsx", "virginica.sav"))

The first argument is the directory, and defaults to the current working directory. There's also an optional pat argument you can supply to read in only files with a specified pattern. You can also optionally have the files read in as a list, rather than binding the data frames together.

Quickly calculate descriptive stats

Quickly calculate descriptive stats for any set of variables.

library(dplyr)
library(sundry)
storms %>% 
  descrips(wind, pressure)

storms %>% 
  group_by(year) %>% 
  descrips(wind, pressure)

storms %>% 
  group_by(year) %>% 
  descrips(wind, pressure,
           .funs = funs(qtile25 = quantile(., 0.25),
                        median, 
                        qtile75 = quantile(., 0.75)))

Remove empty rows from specific columns

This function is similar to janitor::remove_empty_rows, but allows you to pass a set of columns (rather than looking across all columns). Rows will be removed that are missing across all columns.

d <- rio::import("http://www.oregon.gov/ode/educator-resources/assessment/TestResults2017/pagr_schools_ela_tot_ecd_ext_gnd_lep_1617.xlsx",
            setclass = "tbl_df",
            na = c("--", "*")) %>% 
  janitor::clean_names()

d %>% 
  select(district_id, number_level_4:percent_level_1)

The data above have missing data across many columns, but every row has at least some valid entries. Suppose I was only interested in data on schools with proficiency data.

d %>% 
  rm_empty_rows(number_level_4:percent_level_1) %>% 
  select(district_id, number_level_4:percent_level_1) 

The above returns all the rows that are not missing across the full set of variables supplied (rows with partial missing are still returned). The function can also be provided without any column arguments, and the function will then mimic the behavior or janitor::remove_empty_rows.

Filter by Functions

Select rows according to functions. For example, select only the rows with the minimum and maximum values of a specific variable.

storms %>%
 filter_by_funs(wind, funs(min, max))

storms %>%
  group_by(year) %>%
  filter_by_funs(wind, funs(min, median, max)) %>%
  arrange(year)


datalorax/sundry documentation built on April 11, 2021, 1:50 p.m.