In EmilHvitfeldt/percentify: Splitting a Dataset According to Percentile Ranges

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%"
)

percentify

The goal of percentify is to create virtual groups on top of a tibble or grouped_df to allow calculation within percentile ranges of a variable on the whole dataset. You can then efficiently perform various dplyr operations on this resampled_df, like: summarise(), do() and group_map().

Installation

You can install the developmental version of percentify from Github with:

devtools::install_github("EmilHvitfeldt/percentify")

Example

Imagine we want to do some summary statistics at the different percentile ranges of price in diamonds. We start by using percentify_cut to created a percentiled_df on price with splits at 20%, 60%, 80%, 90% and 95%.

library(ggplot2)
library(dplyr)
library(percentify)

diamonds_price <- percentify_cut(diamonds, price, c(0.2, 0.6, 0.8, 0.9, 0.95))

diamonds_price

We can then use this grouped data.frame with summarise to calculate statistics within each range.

summarise(diamonds_price,
          mean_carat = mean(carat),
          procent_ideal = mean(cut == "Ideal"),
          mean_x = mean(x),
          n_obs = n())

Using collect from dplyr will materialize the groups so they can be used for plotting or other calculations.

diamonds_price %>%
  collect() %>%
  ggplot(aes(x, fill = .percentile_price)) +
  geom_histogram(bins = 100)

PLotting function

The resulting grouped data.frame have ggplot2::autoplot() methods to vizualize the the percentile ranges.

percentify_random(diamonds, price, 0.2, 25) %>%
  autoplot()

Inspiration

The underlying code for this package is inspired by the work done by Davis Vaughan in strapgod.

Code of Conduct

Please note that the 'quansum' project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.

EmilHvitfeldt/percentify documentation built on July 9, 2019, 10:54 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

Tweet to @rdrrHQ

GitHub issue tracker

ian@mutexlabs.com