knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
cutr extends base::cut.default's possibilities, getting inspiration from existing
alternatives such as Hmisc::cut2 and the ggplot2::cut_* family of functions,
but going much further.
closed and open_endWe build a distribution and show the default behavior, in the rest of the
document we'll show results using table as it's more telling.
library(cutr) x <- c(rep(1,3),rep(2,2),3:6,17:20) hist(x,breaks = 0:20) cuts <- c(0,5,10,20) smart_cut(x,cuts) table(smart_cut(x,cuts))
The output is very similar to what we get with base or Hmisc and ggplot2
alternatives.
table(base::cut(x,cuts)) table(Hmisc::cut2(x,cuts)) table(smart_cut(x,cuts,"breaks"))
A the difference with base is that we close by default the left size, this can be changed by setting the closed parameter to "right".
Another is that the ends are both closed in smart_cut, this can be changed by setting the open_end parameter to FALSE
closed is borrowed from ggplot2::cut_* functions, and corresponds to the RIGHT parameter of base::cut.default.
open_end corresponds to the (negated) include.lowest parameter of base::cut.
The difference with Hmisc is due to the formatting function, which is formatC for cut and smart_cut, and format for Hmisc (which gives all labels the same width). The display of smart_cut is highly flexible thanks to the argument format_fun detailed further in this document.
what and iThe what parameter determines how cuts will be chosen, depending on the value of i.
table(smart_cut(x,cuts,"breaks")) # fixed breaks table(smart_cut(x,2,"groups")) # groups defined by quantiles table(smart_cut(x,list(2,"balanced"),"groups")) # optimized groups of equal size table(smart_cut(x,3,"n_by_group")) # try to get 3 items by group using quantiles table(smart_cut(x,list(3,"balanced"),"n_by_group")) # try to get 3 items by group using optimization table(smart_cut(x,3,"n_intervals")) # intervals of equal width table(smart_cut(x,7,"width")) # interval of equal defined width, start on 1st value table(smart_cut(x,list(7,"right"),"width")) # interval of equal defined width, end on last value table(smart_cut(x,list(6,"centered"),"width")) # interval of equal defined width, centered table(smart_cut(x,list(6,"centered0"),"width")) # interval of equal defined width, centered on 0 table(smart_cut(x,list(7,0),"width")) # interval of equal defined width, starting on 0 table(smart_cut(x,3,"cluster")) # create groups by running a kmeans clustering
simplifyTRUE by default, when a value is the only one in its group, display it as a
label, without brackets. Similar to oneval in Hmisc::cut2.
table(smart_cut(x, 5, "width")) table(smart_cut(x, 5, "width", simplify = FALSE))
expandexpand makes sure all values from x will be in an interval by expanding the cut
points. base::cut.default never expands, Hmisc::cut2 always expands.
table(smart_cut(x,c(4,10,18))) table(smart_cut(x,c(4,10,18),expand = FALSE))
cropcrop is FALSE by default, if TRUE the side intervals are reduced to fit the data.
table(smart_cut(x,c(0,10,30))) table(smart_cut(x,c(0,10,30),crop = TRUE))
squeezesqueeze is FALSE by default, if TRUE every interval is reduced to fit the data.
table(smart_cut(x,c(0,10,30))) table(smart_cut(x,c(0,10,30),squeeze = TRUE))
brackets + sepDifferent brackets can be chosen
table(smart_cut(x,c(0,10,30), brackets = c("]","[","[","]"))) table(smart_cut(x,c(0,10,30), brackets = NULL, sep = "~", squeeze= TRUE))
labelslabels can be a vector just like in base::cut.default, but it can also
be a function of 2 arguments, which are a vector of values contained in the
interval and a vector of cutpoints.
table(smart_cut(x,c(4,10))) table(smart_cut(x,c(4,10),labels = ~mean(.x))) # mean of values by interval table(smart_cut(x,c(4,10),labels = ~mean(.y))) # center of interval table(smart_cut(x,c(4,10),labels = ~median(.x))) # median table(smart_cut(x,c(4,10),labels = ~paste( sep="~",.y[1],round(mean(.x),2),.y[2]))) # a more sophisticated label
format_funWith cutr the user can provide any formating function through the argument format_fun, including the package function format_metric.
table(smart_cut(x^6 + x/100,5,"g")) table(smart_cut(x^6 + x/100,5,"g",format_fun = format, digits = 3)) table(smart_cut(x^6,5,"g",format_fun = signif)) table(smart_cut(x^6,5,"g",format_fun = smart_signif)) table(smart_cut(x^6,5,"g",format_fun = format_metric))
groupsgroups and n_by_group try to place cut points at relevant quantile positions,
we won't get the required number of groups if several quantiles fall on the same
value, to remedy to this we can use an optimization function.
The most straightforward way to optimize bin size, and the only way most will
ever need, is to minimize the variance between the target bin size and the
actual binsize, this is what happens when the i argument is
list(n, "balanced").
table(smart_cut(x,3,"groups")) table(smart_cut(x,list(3,"balanced"),"groups"))
the second element of the list can be a string (which will be mapped to a predefined function) or a custom made 2 argument function that is applied on all possible bin combinations, its arguments are : bin size the cut points
The combination of cut points that returns the lowest value when passed to this function will be selected (or the first of them if the minimum is not unique).
cutf and cutf2These are copies of base::cut.default and Hmisc::cut2 with the difference
that the formatting function can be used freely. All the features are contained
in smart_cut but these functions allow users to keep the interface, and
defaults of the function they know and to modify existing code easily,
for example to leverage format_metric with minimal effort.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.