knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.path = "man/figures/README-", out.width = "100%" )
The goal of simplepivot is to facilitate simple pivot-table summaries and manipulations of data frames.
Spreadsheet users enjoy the simplicity and intuitiveness of pivot-tables for data summary and manipulation. R users can achieve similar results using standard data manipulation libraries like dplyr. However, the process is less intuitive and the barrier to entry is higher for new users. This package supplies a simple wrapper to dplyr functionality, to allow users to do pivot-table style manipulations, and create summary tables (in data-frame form).
The main function 'simple_pivot' fits into a standard data pipeline, and returns a pivoted version of the input data. The package also includes 'helper' functions, which are used to create formatted values for the pivot table output.
It is extremely simple to build a basic pivot table using simplepivot
. Pivot table transformations are done with simple_pivot()
. Similar to MS Excel and Google Sheets, the user supplies the data-frame, the row variables, the column variables, the values, and a function to summarise the values:
The required arguments are:
df
(a dataframe)row_vars
(a character vector with the names of the row variables)col_vars
(a character vector with the names of the column variables)value_variable
(a string with the variable to be summarised)value_function
(a function that takes a sub-vector of the value variable, and summaries it, e.g. mean
, sd
, etc.)Note: At least one row variable, and one column variable should be used. If you don't want to have row and column categories, then the easiest way is to use group_by
then summarise
verbs from the dplyr package.
Let's use simple_pivot
to build a simple summary of the ChickWeight dataset. This dataset records the weights of multiple baby chickens, from birth up to 21 days. Different chickens were put on different diets (an experiment), so the dataset allows potential evaluation of the optimal diet for chicks (in terms of weight).
Notice that the simple_pivot
function fits nicely into the dplyr pipeline
library(dplyr) library(simplepivot) library(knitr) data("ChickWeight") ChickWeight %>% filter(Time == 0 | Time == 21) %>% simple_pivot( row_vars = "Diet", col_vars = "Time", value_variable = "weight", value_function = mean ) %>% knitr::kable()
Here we can see that all groups of chicks had a similar average weight at time 0. But by time 21, there were clear differences between the groups, with the Diet 3 group having a considerably higher average weight.
Let's say we wanted to report the spread as well as the mean value for each of the groups. For this we can use simplepivot
's simplepivot_mean_sd
summary function, in place of the mean
function in base R. This will produce a similar dataframe, with nicely formatted string values in each 'cell', showing the mean and standard deviation of the weight
data variable.
library(dplyr) library(simplepivot) library(knitr) data("ChickWeight") ChickWeight %>% filter(Time == 0 | Time == 21) %>% simple_pivot( row_vars = "Diet", col_vars = "Time", value_variable = "weight", value_function = simplepivot_mean_sd ) %>% knitr::kable()
Median and inter-quartile range (IQR) can sometimes provide more robust summaries, because they are less sensitive to extreme values (i.e. outliers). To repeat the above analysis using median and IQR, we can use simplepivot
's simplepivot_median_IQR
function instead:
library(dplyr) library(simplepivot) library(knitr) data("ChickWeight") ChickWeight %>% filter(Time == 0 | Time == 21) %>% simple_pivot( row_vars = "Diet", col_vars = "Time", value_variable = "weight", value_function = simplepivot_median_IQR ) %>% knitr::kable()
Now, we can see that the spread of data within groups (SD, IQR) is quite large. This will likely lead us to wonder about the extent to which the differences between groups could be dismissed as sampling variation. Answering this question formally, would involve application of ANOVA or a similar technique. However, we can get an initial intuition using the simple_pivot
package's simplepivot_mean_SE
function. This is similar to the functions above, except the figure in the brackets becomes an estimate of the Standard Error of the mean ($\frac{SD}{\sqrt(n)}$). This is not a formal test of the Null hypothesis. However, we can reasonably expect that if the SE is many times smaller than the different between the groups, then the differences between the groups are likely to be significant using more formal tests.
library(dplyr) library(simplepivot) library(knitr) data("ChickWeight") ChickWeight %>% filter(Time == 0 | Time == 21) %>% simple_pivot( row_vars = "Diet", col_vars = "Time", value_variable = "weight", value_function = simplepivot_mean_SE ) %>% knitr::kable()
Now, we can get even more sophisticated using the simplepivot_mean_95CI
function, which will place the 95\% confidence intervals for the mean in the brackets.
library(dplyr) library(simplepivot) library(knitr) data("ChickWeight") ChickWeight %>% filter(Time == 0 | Time == 21) %>% simple_pivot( row_vars = "Diet", col_vars = "Time", value_variable = "weight", value_function = simplepivot_mean_95CI ) %>% knitr::kable()
Some of the summary functions noted above output strings in the cells of the table (e.g. "20 (16-24)"). But if base R functions are used, then analysis can continue on the resulting dataframe. Let's use the mean summary of chick weight over groups, then add additional variables later in the pipeline, using dplyr verbs.
library(dplyr) library(simplepivot) library(knitr) data("ChickWeight") ChickWeight %>% filter(Time == 0 | Time == 21) %>% simple_pivot( row_vars = "Diet", col_vars = "Time", value_variable = "weight", value_function = mean ) %>% mutate( `Percentage Change` = ((`(Time) 21` - `(Time) 0`) / `(Time) 0`) * 100 ) %>% knitr::kable()
Description TBC - the package does support multiple row and column variables.
Not yet available on CRAN.
You can install the development version from GitHub with:
# install.packages("devtools") devtools::install_github("ltdroy/simplepivot")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.