slice_top: Subset the rows of the data by top package
In numbats/cranscrub: Tools for CRAN Data

slice_top

R Documentation

Subset the rows of the data by top package

Description

This function allow to easily subset the full temporal data by an aggregate statistic across (subset of) the temporal variable. For example, we have the daily download count for each package from 2012 to 2020 but we want to subset the data based on the top n packages, where top is determined by the total downloads over 2018-2020.

Usage

slice_top(
  .data,
  order_by = "n_unique",
  n,
  prop,
  with_ties = TRUE,
  .fun = sum,
  rank = "package",
  from = Sys.Date() - 365,
  to = Sys.Date()
)

Arguments

`.data`	A data frame, consisting of a column `date`, that rank a category based on some metric for specified range of dates
`order_by`	The name of the column to order the ranking by.
`n`	The number of top packages to filter the data by.
`prop`	The proportion of the the top package to filter the data by. Currently not implemented.
`with_ties`	Whether to include ties or not. Currently not implemented.

Examples

library(ggplot2)
ctvExperimentalDesign %>% 
  slice_top(n = 10) %>% 
  ggplot(aes(date, n_unique, group = package)) + 
  geom_line() + 
  facet_grid(package ~ .)

numbats/cranscrub documentation built on July 1, 2022, 4:34 p.m.