knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "README-"
)

tidbits

The goal of tidbits is to package up some utility functions that have proven useful in multiple data analysis projects and teaching, so they can be properly documented and more easily deployed. Including autoread() function which wraps readers for a wide variety of data formats so the same script can run on different files without editing the file-loading code.

Installation

You can install tidbits from github with:

# install.packages("devtools")
devtools::install_github("bokov/tidbits")

Example

This is a basic example which shows you how to solve a common problem:

library(tidbits);
# Read data from the NAACCR website
dat00 <- autoread('https://www.naaccr.org/wp-content/uploads/2017/02/naaccr_cina_2009_2013_stage.sas7bdat');
# Build an automatic data dictionary
dct0 <- tblinfo(dat00)

Now that there exists a data.frame compatible object named dct0 in your environment, you can pull various collections of column names out of it for the table on which it was based (dat00).

# To see which column groupings exist, call it without any arguments
v()
# To get the names of just the numeric columns 
v(c_numeric)
# To get the names of uninformative columns (i.e. their value never changes)
v(c_uninformative)
# Complex columns aren't literally complex numbers, but rather factors that have a huge number of levels
v(c_complex)
# Ordinal columns are ones that are numeric, yet have few distinct values and it might make sense to discretize them
v(c_ordinal)
# 'c_factor' columns are non-numeric ones that might be good choices for converting to factors
v(c_factor)
# the 'c_tm' group are columns which have only one distinct non-missing value, 'c_tf' ones have only two distinct non-missing values, and 'c_empty' ones are missing all values. None of those are represented in the NAACCR dataset.

Travis-CI Build Status Coverage Status AppVeyor Build Status



bokov/tidbits documentation built on Jan. 26, 2024, 6:25 p.m.