knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.path = "README-" )
The goal of tidbits is to package up some utility functions that have proven
useful in multiple data analysis projects and teaching, so they can be
properly documented and more easily deployed. Including autoread()
function
which wraps readers for a wide variety of data formats so the same script can
run on different files without editing the file-loading code.
You can install tidbits from github with:
# install.packages("devtools") devtools::install_github("bokov/tidbits")
This is a basic example which shows you how to solve a common problem:
library(tidbits); # Read data from the NAACCR website dat00 <- autoread('https://www.naaccr.org/wp-content/uploads/2017/02/naaccr_cina_2009_2013_stage.sas7bdat'); # Build an automatic data dictionary dct0 <- tblinfo(dat00)
Now that there exists a data.frame compatible object named dct0
in your environment, you can pull various
collections of column names out of it for the table on which it was based (dat00
).
# To see which column groupings exist, call it without any arguments v() # To get the names of just the numeric columns v(c_numeric) # To get the names of uninformative columns (i.e. their value never changes) v(c_uninformative) # Complex columns aren't literally complex numbers, but rather factors that have a huge number of levels v(c_complex) # Ordinal columns are ones that are numeric, yet have few distinct values and it might make sense to discretize them v(c_ordinal) # 'c_factor' columns are non-numeric ones that might be good choices for converting to factors v(c_factor) # the 'c_tm' group are columns which have only one distinct non-missing value, 'c_tf' ones have only two distinct non-missing values, and 'c_empty' ones are missing all values. None of those are represented in the NAACCR dataset.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.