knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.path = "man/figures/README-", out.width = "100%" )
The goal of percentify is to create virtual groups on top of a tibble or grouped_df to allow calculation within percentile ranges of a variable on the whole dataset. You can then efficiently perform various dplyr operations on this resampled_df, like: summarise(), do() and group_map().
You can install the developmental version of percentify from Github with:
devtools::install_github("EmilHvitfeldt/percentify")
Imagine we want to do some summary statistics at the different percentile ranges of price in diamonds. We start by using percentify_cut to created a percentiled_df on price with splits at 20%, 60%, 80%, 90% and 95%.
library(ggplot2) library(dplyr) library(percentify)
diamonds_price <- percentify_cut(diamonds, price, c(0.2, 0.6, 0.8, 0.9, 0.95)) diamonds_price
We can then use this grouped data.frame with summarise to calculate statistics within each range.
summarise(diamonds_price, mean_carat = mean(carat), procent_ideal = mean(cut == "Ideal"), mean_x = mean(x), n_obs = n())
Using collect from dplyr will materialize the groups so they can be used for plotting or other calculations.
diamonds_price %>% collect() %>% ggplot(aes(x, fill = .percentile_price)) + geom_histogram(bins = 100)
The resulting grouped data.frame have ggplot2::autoplot() methods to vizualize the the percentile ranges.
percentify_random(diamonds, price, 0.2, 25) %>% autoplot()
The underlying code for this package is inspired by the work done by Davis Vaughan in strapgod.
Please note that the 'quansum' project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.