README.md

output: github_document

modellingTools: Common Tools for Data Preparation and Modelling

Programming in R is delightful. Data analysis in R can be a bit challenging at times. modellingTools was created to provide a formal outlet for useful personal tools I have developed in order to make data preparation and analysis simpler using R. I found that too often, when attempting to get to know my dataset using R, I fell in to the following pattern:

After a year or so of this, I started getting smart about it: every time I modified a base function in some useful way, I would save it in a function. But soon, I found myself following a new pattern:

Finally I bought Hadley Wickham's book, and figured now's as good as ever to learn how to build a package. This solves my above problems because:

A fourth benefit is: you get to use the package too! Thank you for doing so, and please let me know via email (alex@alexstringer.ca) if you have any bugs for me to fix, or suggestions for new features.

Example: Frequency distribution of a variable

Getting the frequency distribution of a variable in base R is actually surprisingly unpleasant. The table function requires vectors as input:

data(CO2)
table(CO2$conc)
#> 
#>   95  175  250  350  500  675 1000 
#>   12   12   12   12   12   12   12

As you can see, the output also isn't that pretty. You can clean up the code using with,

with(CO2,table(conc))
#> conc
#>   95  175  250  350  500  675 1000 
#>   12   12   12   12   12   12   12

or if you're really cutting-edge, with the %$% operator from the magrittr package:

# install.packages("magrittr")
library(magrittr)
CO2 %$% table(conc)
#> conc
#>   95  175  250  350  500  675 1000 
#>   12   12   12   12   12   12   12

All this for a basic frequency distribution. And don't even think about doing it for a continuous variable:

CO2 %$% table(uptake)
#> uptake
#>  7.7  9.3 10.5 10.6 11.3 11.4   12 12.3 12.5   13 13.6 13.7 14.2 14.4 14.9 
#>    1    1    1    2    1    1    1    1    1    1    1    1    1    1    1 
#> 15.1   16 16.2 17.9   18 18.1 18.9 19.2 19.4 19.5 19.9   21 21.9   22 22.2 
#>    1    1    1    3    1    1    2    1    1    1    1    1    1    1    1 
#> 24.1 25.8 26.2 27.3 27.8 27.9 28.1 28.5   30 30.3 30.4 30.6 30.9 31.1 31.5 
#>    1    1    1    2    1    1    1    1    1    1    1    1    1    1    1 
#> 31.8 32.4 32.5   34 34.6 34.8   35 35.3 35.4 35.5 37.1 37.2 37.5 38.1 38.6 
#>    1    3    1    1    1    1    1    1    1    1    1    1    1    1    1 
#> 38.7 38.8 38.9 39.2 39.6 39.7 40.3 40.6 41.4 41.8 42.1 42.4 42.9 43.9 44.3 
#>    1    1    1    1    1    1    1    1    2    1    1    1    1    1    1 
#> 45.5 
#>    1

Talk about hard to read, and that's only 84 observations!

Try proc_freq, from the modellingTools package. Advantages:

We can do

proc_freq(CO2,"conc")
#> Source: local data frame [7 x 3]
#> 
#>   level count percent
#>   (dbl) (int)   (chr)
#> 1    95    12   14.3%
#> 2   175    12   14.3%
#> 3   250    12   14.3%
#> 4   350    12   14.3%
#> 5   500    12   14.3%
#> 6   675    12   14.3%
#> 7  1000    12   14.3%

as well as

proc_freq(CO2,"uptake")
#> Source: local data frame [76 x 3]
#> 
#>    level count percent
#>    (dbl) (int)   (chr)
#> 1    7.7     1   1.19%
#> 2    9.3     1   1.19%
#> 3   10.5     1   1.19%
#> 4   10.6     2   2.38%
#> 5   11.3     1   1.19%
#> 6   11.4     1   1.19%
#> 7   12.0     1   1.19%
#> 8   12.3     1   1.19%
#> 9   12.5     1   1.19%
#> 10  13.0     1   1.19%
#> ..   ...   ...     ...

The real value comes from

proc_freq(CO2,"uptake",bins = 4)
#> Source: local data frame [4 x 3]
#> 
#>         level count percent
#>        (fctr) (int)   (chr)
#> 1  [7.7,17.1]    19  22.62%
#> 2 (17.1,26.6]    18  21.43%
#> 3   (26.6,36]    25  29.76%
#> 4   (36,45.5]    22  26.19%

Installation Instructions

modellingTools is now on CRAN, so you can get the package by typing

install.packages("modellingTools")

Since I'm actively developing the package, it may just be better to use the development version:

install.packages("devtools")
devtools::install_github("awstringer/modellingTools")

After that, attach the package

library(modellingTools)

and you're good to go!

Overview

For a detailed overview and introduction to using the package and what it does, see the vignette. Check out the github page for all the code as well.



awstringer/modellingTools documentation built on May 11, 2019, 4:11 p.m.