In nhemerson/tibbleColumns: A useful set of functions that focus on tibble data frames.

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

library(tidyverse)
library(tibbleColumns)

The tibbleColumns package is designed to offer some time saving functions that fix problems you didn't even know you had within the tidyverse. It also introduces some advanced methods I've developed for advanced pipe sequences. Some examples of what this package offers are:

Quick proportion columns
Quick x_o_x columns for MoM, YoY etc.
Perform t-tests with tibbles instead of coverting to data.frame
Get a nice lm summary tibble a la broom package
"Tibble Out" the current state of a pipe sequence to your global environment
Plug in a "Tibble Module" to manipulate data outside the pipe sequence

NOTE ON FORMATTING: You'll notice I am not breaking the line at the end of each pipe, as Hadley prefers to do. That is because I've started to make my code more horizontal and, through pipe programming, write my structure more as a sentence. The goal of the tibbleColumns package is meant to keep you within the pipe sequence and write code as quickly as possible without having to start a new line. See more on this concept of "Block Coding" at my blog post here.

Proportion Columns

This entire package started when I realized that creating a proportion column in R was not a simple and easy thing for tibbles. I was frustrated that Excel was able to do something better than R so I decided to do something about it.

#prop_column_group
mtcars %>% prop_column_group(cyl)

#prop_column
mtcars %>% count(cyl, mpg) %>% prop_column(n)

XoX Columns

There are two particular functios in the tibbleColumns package that create either a column representing the percentage change from one number to another or an entirely new tibble with a group, number and percentage change.

The change_XoX_column creates a single column that calculates the percentage of change between the vectors in two other columns. These columns need to be organized from left to right. Meaning, column of prior month data to the left of a column for current month.

#change_XoX_column
tc <- tibble(
  Month = c("Jan", "Dec"),
  Users = c(102, 909)
)

tc %>% spread(Month, Users) %>% change_XoX_column(Dec, Jan, "MoM")

The change_XoX_column_group will simply look at an entire data set and do all the above coding for you along with adding a group column as well.

So let’s say you want to find out the MoM change in revenue by device from our hypothetical data set. The function will do that very quickly:

NOTE: For this function you need to select certain columns of the data in a certain order. The columns also need to be specific data types. The order goes Group (character), Date (character), Aggregate Number (numeric)

#change_XoX_column_group
tcg <- tibble(
  Month = c("Jan", "Jan","Dec", "Dec"),
  Type = c("Red", "Blue", "Red", "Blue"),
  Users = c(102, 909, 201, 479)
)

tcg %>% select(Type, Month, Users) %>% change_XoX_column_group(Dec, Jan, "MoM")

T-test

The tibble interacts poorly with base stat functions like t.test(). Instead of just complaining about how data_frame() wasn’t recognized by the base function, I decided to create a wrapper function that simply does all the dirty work for you. So you can now pass two tibbles to this function and it will return a nice tibble summary of your t-test.

#testoutput
t1 <- tibble(
    type = c("Blue", "Blue", "Blue", "Blue"),
    num = runif(4,0,5)
  )

t2 <- tibble(
  type = c("Red","Red", "Red", "Red"),
  num = runif(4,0,5)
)

ttest_tibble(t1$num,t2$num)

Lead and Lag Columns

It's common to want to compare previous and post values to a current value in a row. Dplyr contains the lag() and lead() functions but I've wrapped those along with mutate() to make a couple nifty column functions.

#create a new column with lead values
mtcars %>% select(mpg,cyl) %>% lead_col(cyl,0)

#create a new column with lag values
mtcars %>% select(mpg,cyl) %>% lag_col(cyl,0)

Tibble Out and Tibble Module

Other than just wrapping existing tidyverse functions into new ones, I wanted to see if there was anything new I could offer the pipe ecosystem. After some sudden moments of inspiration I developed the tbl_out() and tbl_module() functions, respectively.

the tbl_out() function, when used within a pipe sequence, will take the current state of the data frame in the pipe sequence and create it as an object in the global environment. It will then send that state back into the pipe sequence to go on and be manipulated further It, essentially, allows you to create multiple data frames that represent different states of manipulation in the pipe sequence.

#tibble out
mtcars %>% rownames_to_column() %>% rename(Name = rowname) %>% tbl_out("cars") %>% select(cyl,hp,mpg) %>% tbl_out("cars1") %>% mutate(Calc = hp/mpg) %>% round(0) %>% tbl_out("roundOut") %>% lm_summary_tibble("mpg") %>% tbl_out("lmOut")

The point of a tbl_module is to plug in a separate transformation within the pipe sequence that works outside the pipe and saves to your global environment. I find this especially convenient when I have date/time columns that I want to separate, change date on or get additional intervals of (via lubridate). Learn more about tbl_module at my blog post here.

NOTE: The separate function has a 'remove' argument that will keep the original column after separating. This example is meant for easy clarity of what the tbl_module is doing.

#tbl_module
mtcars %>% tbl_out("cars") %>% tbl_module(filter(.,hp > 150), "fastCars") %>% tbl_lookup(cyl) %>% tbl_out("cylList")

#tbl_module date example
dateEx <- data_frame(date = c(Sys.Date(),Sys.Date() - 1,Sys.Date() - 3), num = rnorm(3))

dateEx %>% tbl_module(select(.,date),"dateCol") %>% separate(date, into = c("year","month","day")) %>% bind_cols(dateCol,.)

nhemerson/tibbleColumns documentation built on May 29, 2019, 7:18 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

nhemerson/tibbleColumns
A useful set of functions that focus on tibble data frames.

In nhemerson/tibbleColumns: A useful set of functions that focus on tibble data frames.

Proportion Columns

XoX Columns

T-test

Lead and Lag Columns

Tibble Out and Tibble Module

R Package Documentation

Browse R Packages

We want your feedback!

nhemerson/tibbleColumns A useful set of functions that focus on tibble data frames.

In nhemerson/tibbleColumns: A useful set of functions that focus on tibble data frames.

Proportion Columns

XoX Columns

T-test

Lead and Lag Columns

Tibble Out and Tibble Module

R Package Documentation

Browse R Packages

We want your feedback!

nhemerson/tibbleColumns
A useful set of functions that focus on tibble data frames.