knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
cutr extends base::cut.default
's possibilities, getting inspiration from existing
alternatives such as Hmisc::cut2
and the ggplot2::cut_*
family of functions,
but going much further.
closed
and open_end
We build a distribution and show the default behavior, in the rest of the
document we'll show results using table
as it's more telling.
library(cutr) x <- c(rep(1,3),rep(2,2),3:6,17:20) hist(x,breaks = 0:20) cuts <- c(0,5,10,20) smart_cut(x,cuts) table(smart_cut(x,cuts))
The output is very similar to what we get with base
or Hmisc
and ggplot2
alternatives.
table(base::cut(x,cuts)) table(Hmisc::cut2(x,cuts)) table(smart_cut(x,cuts,"breaks"))
A the difference with base
is that we close by default the left size, this can be changed by setting the closed
parameter to "right"
.
Another is that the ends are both closed in smart_cut
, this can be changed by setting the open_end
parameter to FALSE
closed
is borrowed from ggplot2::cut_*
functions, and corresponds to the RIGHT
parameter of base::cut.default
.
open_end
corresponds to the (negated) include.lowest
parameter of base::cut
.
The difference with Hmisc
is due to the formatting function, which is formatC
for cut
and smart_cut
, and format
for Hmisc
(which gives all labels the same width). The display of smart_cut
is highly flexible thanks to the argument format_fun
detailed further in this document.
what
and i
The what
parameter determines how cuts will be chosen, depending on the value of i
.
table(smart_cut(x,cuts,"breaks")) # fixed breaks table(smart_cut(x,2,"groups")) # groups defined by quantiles table(smart_cut(x,list(2,"balanced"),"groups")) # optimized groups of equal size table(smart_cut(x,3,"n_by_group")) # try to get 3 items by group using quantiles table(smart_cut(x,list(3,"balanced"),"n_by_group")) # try to get 3 items by group using optimization table(smart_cut(x,3,"n_intervals")) # intervals of equal width table(smart_cut(x,7,"width")) # interval of equal defined width, start on 1st value table(smart_cut(x,list(7,"right"),"width")) # interval of equal defined width, end on last value table(smart_cut(x,list(6,"centered"),"width")) # interval of equal defined width, centered table(smart_cut(x,list(6,"centered0"),"width")) # interval of equal defined width, centered on 0 table(smart_cut(x,list(7,0),"width")) # interval of equal defined width, starting on 0 table(smart_cut(x,3,"cluster")) # create groups by running a kmeans clustering
simplify
TRUE
by default, when a value is the only one in its group, display it as a
label, without brackets. Similar to oneval
in Hmisc::cut2
.
table(smart_cut(x, 5, "width")) table(smart_cut(x, 5, "width", simplify = FALSE))
expand
expand makes sure all values from x will be in an interval by expanding the cut
points. base::cut.default
never expands, Hmisc::cut2
always expands.
table(smart_cut(x,c(4,10,18))) table(smart_cut(x,c(4,10,18),expand = FALSE))
crop
crop
is FALSE
by default, if TRUE
the side intervals are reduced to fit the data.
table(smart_cut(x,c(0,10,30))) table(smart_cut(x,c(0,10,30),crop = TRUE))
squeeze
squeeze
is FALSE
by default, if TRUE
every interval is reduced to fit the data.
table(smart_cut(x,c(0,10,30))) table(smart_cut(x,c(0,10,30),squeeze = TRUE))
brackets
+ sep
Different brackets can be chosen
table(smart_cut(x,c(0,10,30), brackets = c("]","[","[","]"))) table(smart_cut(x,c(0,10,30), brackets = NULL, sep = "~", squeeze= TRUE))
labels
labels
can be a vector just like in base::cut.default
, but it can also
be a function of 2 arguments, which are a vector of values contained in the
interval and a vector of cutpoints.
table(smart_cut(x,c(4,10))) table(smart_cut(x,c(4,10),labels = ~mean(.x))) # mean of values by interval table(smart_cut(x,c(4,10),labels = ~mean(.y))) # center of interval table(smart_cut(x,c(4,10),labels = ~median(.x))) # median table(smart_cut(x,c(4,10),labels = ~paste( sep="~",.y[1],round(mean(.x),2),.y[2]))) # a more sophisticated label
format_fun
With cutr
the user can provide any formating function through the argument format_fun
, including the package function format_metric
.
table(smart_cut(x^6 + x/100,5,"g")) table(smart_cut(x^6 + x/100,5,"g",format_fun = format, digits = 3)) table(smart_cut(x^6,5,"g",format_fun = signif)) table(smart_cut(x^6,5,"g",format_fun = smart_signif)) table(smart_cut(x^6,5,"g",format_fun = format_metric))
groups
groups
and n_by_group
try to place cut points at relevant quantile positions,
we won't get the required number of groups if several quantiles fall on the same
value, to remedy to this we can use an optimization function.
The most straightforward way to optimize bin size, and the only way most will
ever need, is to minimize the variance between the target bin size and the
actual binsize, this is what happens when the i
argument is
list(n, "balanced")
.
table(smart_cut(x,3,"groups")) table(smart_cut(x,list(3,"balanced"),"groups"))
the second element of the list can be a string (which will be mapped to a predefined function) or a custom made 2 argument function that is applied on all possible bin combinations, its arguments are : bin size the cut points
The combination of cut points that returns the lowest value when passed to this function will be selected (or the first of them if the minimum is not unique).
cutf
and cutf2
These are copies of base::cut.default
and Hmisc::cut2
with the difference
that the formatting function can be used freely. All the features are contained
in smart_cut
but these functions allow users to keep the interface, and
defaults of the function they know and to modify existing code easily,
for example to leverage format_metric
with minimal effort.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.