knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.path = "README-" )
The sundry package is a personal R package filled with functions that make my
life a little easier when working on day-to-day analyses. Most of the functions
in the package are designed to be friendly with the
tidyverse and thus are pipe friendly (%>%
), and
work with other functions like dplyr::group_by
. The package is somewhat perpetually under development.
You can install sundry from github with:
# install.packages("devtools") devtools::install_github("datalorax/sundry")
Below are a few examples of functions in the package.
Maybe my favorite function in the package is the read_files
function, which
will read in n datasets and, if possible bind them together into a single
tibble (data frame). The function uses rio::import
, which makes it really
nice because you don't have to worry about file types basically at all, and
you can even read in data of different types all at once.
library(sundry) library(tidyverse) by_species <- iris %>% split(.$Species) %>% map(select, -Species) str(by_species) # export as three different file types rio::export(by_species$setosa, "setosa.csv") rio::export(by_species$versicolor, "versicolor.xlsx") rio::export(by_species$virginica, "virginica.sav") # import them all back in as a single data frame d <- read_files() d d %>% count(file) fs::file_delete(c("setosa.csv", "versicolor.xlsx", "virginica.sav"))
The first argument is the directory, and defaults to the current working
directory. There's also an optional pat
argument you can supply to read in
only files with a specified pattern. You can also optionally have the files
read in as a list, rather than binding the data frames together.
Quickly calculate descriptive stats for any set of variables.
library(dplyr) library(sundry) storms %>% descrips(wind, pressure) storms %>% group_by(year) %>% descrips(wind, pressure) storms %>% group_by(year) %>% descrips(wind, pressure, .funs = funs(qtile25 = quantile(., 0.25), median, qtile75 = quantile(., 0.75)))
This function is similar to janitor::remove_empty_rows
, but allows you to
pass a set of columns (rather than looking across all columns). Rows will be
removed that are missing across all columns.
d <- rio::import("http://www.oregon.gov/ode/educator-resources/assessment/TestResults2017/pagr_schools_ela_tot_ecd_ext_gnd_lep_1617.xlsx", setclass = "tbl_df", na = c("--", "*")) %>% janitor::clean_names() d %>% select(district_id, number_level_4:percent_level_1)
The data above have missing data across many columns, but every row has at least some valid entries. Suppose I was only interested in data on schools with proficiency data.
d %>% rm_empty_rows(number_level_4:percent_level_1) %>% select(district_id, number_level_4:percent_level_1)
The above returns all the rows that are not missing across the full set of variables supplied (rows with partial missing are still returned). The function can also be provided without any column arguments, and the function will then mimic the behavior or janitor::remove_empty_rows
.
Select rows according to functions. For example, select only the rows with the minimum and maximum values of a specific variable.
storms %>% filter_by_funs(wind, funs(min, max)) storms %>% group_by(year) %>% filter_by_funs(wind, funs(min, median, max)) %>% arrange(year)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.