knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.path = "man/figures/README-", out.width = "100%" )
The goal of percentify is to create virtual groups on top of a tibble
or grouped_df
to allow calculation within percentile ranges of a variable on the whole dataset. You can then efficiently perform various dplyr operations on this resampled_df, like: summarise()
, do()
and group_map()
.
You can install the developmental version of percentify from Github with:
devtools::install_github("EmilHvitfeldt/percentify")
Imagine we want to do some summary statistics at the different percentile ranges of price in diamonds. We start by using percentify_cut
to created a percentiled_df
on price with splits at 20%
, 60%
, 80%
, 90%
and 95%
.
library(ggplot2) library(dplyr) library(percentify)
diamonds_price <- percentify_cut(diamonds, price, c(0.2, 0.6, 0.8, 0.9, 0.95)) diamonds_price
We can then use this grouped data.frame with summarise
to calculate statistics within each range.
summarise(diamonds_price, mean_carat = mean(carat), procent_ideal = mean(cut == "Ideal"), mean_x = mean(x), n_obs = n())
Using collect
from dplyr will materialize the groups so they can be used for plotting or other calculations.
diamonds_price %>% collect() %>% ggplot(aes(x, fill = .percentile_price)) + geom_histogram(bins = 100)
The resulting grouped data.frame have ggplot2::autoplot()
methods to vizualize the the percentile ranges.
percentify_random(diamonds, price, 0.2, 25) %>% autoplot()
The underlying code for this package is inspired by the work done by Davis Vaughan in strapgod.
Please note that the 'quansum' project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.