knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
library(tidyverse) library(tibbleColumns)
The tibbleColumns package is designed to offer some time saving functions that fix problems you didn't even know you had within the tidyverse. It also introduces some advanced methods I've developed for advanced pipe sequences. Some examples of what this package offers are:
NOTE ON FORMATTING: You'll notice I am not breaking the line at the end of each pipe, as Hadley prefers to do. That is because I've started to make my code more horizontal and, through pipe programming, write my structure more as a sentence. The goal of the tibbleColumns package is meant to keep you within the pipe sequence and write code as quickly as possible without having to start a new line. See more on this concept of "Block Coding" at my blog post here.
This entire package started when I realized that creating a proportion column in R was not a simple and easy thing for tibbles. I was frustrated that Excel was able to do something better than R so I decided to do something about it.
#prop_column_group mtcars %>% prop_column_group(cyl) #prop_column mtcars %>% count(cyl, mpg) %>% prop_column(n)
There are two particular functios in the tibbleColumns package that create either a column representing the percentage change from one number to another or an entirely new tibble with a group, number and percentage change.
The change_XoX_column creates a single column that calculates the percentage of change between the vectors in two other columns. These columns need to be organized from left to right. Meaning, column of prior month data to the left of a column for current month.
#change_XoX_column tc <- tibble( Month = c("Jan", "Dec"), Users = c(102, 909) ) tc %>% spread(Month, Users) %>% change_XoX_column(Dec, Jan, "MoM")
The change_XoX_column_group will simply look at an entire data set and do all the above coding for you along with adding a group column as well.
So let’s say you want to find out the MoM change in revenue by device from our hypothetical data set. The function will do that very quickly:
NOTE: For this function you need to select certain columns of the data in a certain order. The columns also need to be specific data types. The order goes Group (character), Date (character), Aggregate Number (numeric)
#change_XoX_column_group tcg <- tibble( Month = c("Jan", "Jan","Dec", "Dec"), Type = c("Red", "Blue", "Red", "Blue"), Users = c(102, 909, 201, 479) ) tcg %>% select(Type, Month, Users) %>% change_XoX_column_group(Dec, Jan, "MoM")
The tibble interacts poorly with base stat functions like t.test(). Instead of just complaining about how data_frame() wasn’t recognized by the base function, I decided to create a wrapper function that simply does all the dirty work for you. So you can now pass two tibbles to this function and it will return a nice tibble summary of your t-test.
#testoutput t1 <- tibble( type = c("Blue", "Blue", "Blue", "Blue"), num = runif(4,0,5) ) t2 <- tibble( type = c("Red","Red", "Red", "Red"), num = runif(4,0,5) ) ttest_tibble(t1$num,t2$num)
It's common to want to compare previous and post values to a current value in a row. Dplyr contains the lag() and lead() functions but I've wrapped those along with mutate() to make a couple nifty column functions.
#create a new column with lead values mtcars %>% select(mpg,cyl) %>% lead_col(cyl,0) #create a new column with lag values mtcars %>% select(mpg,cyl) %>% lag_col(cyl,0)
Other than just wrapping existing tidyverse functions into new ones, I wanted to see if there was anything new I could offer the pipe ecosystem. After some sudden moments of inspiration I developed the tbl_out() and tbl_module() functions, respectively.
the tbl_out() function, when used within a pipe sequence, will take the current state of the data frame in the pipe sequence and create it as an object in the global environment. It will then send that state back into the pipe sequence to go on and be manipulated further It, essentially, allows you to create multiple data frames that represent different states of manipulation in the pipe sequence.
#tibble out mtcars %>% rownames_to_column() %>% rename(Name = rowname) %>% tbl_out("cars") %>% select(cyl,hp,mpg) %>% tbl_out("cars1") %>% mutate(Calc = hp/mpg) %>% round(0) %>% tbl_out("roundOut") %>% lm_summary_tibble("mpg") %>% tbl_out("lmOut")
The point of a tbl_module is to plug in a separate transformation within the pipe sequence that works outside the pipe and saves to your global environment. I find this especially convenient when I have date/time columns that I want to separate, change date on or get additional intervals of (via lubridate). Learn more about tbl_module at my blog post here.
NOTE: The separate function has a 'remove' argument that will keep the original column after separating. This example is meant for easy clarity of what the tbl_module is doing.
#tbl_module mtcars %>% tbl_out("cars") %>% tbl_module(filter(.,hp > 150), "fastCars") %>% tbl_lookup(cyl) %>% tbl_out("cylList") #tbl_module date example dateEx <- data_frame(date = c(Sys.Date(),Sys.Date() - 1,Sys.Date() - 3), num = rnorm(3)) dateEx %>% tbl_module(select(.,date),"dateCol") %>% separate(date, into = c("year","month","day")) %>% bind_cols(dateCol,.)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.