cut divides the range of
x into intervals
and codes the values in
x according to which
interval they fall. The leftmost interval corresponds to level one,
the next leftmost to level two and so on.
1 2 3 4 5 6
a numeric vector which is to be converted to a factor by cutting.
either a numeric vector of two or more unique cut points or a
single number (greater than or equal to 2) giving the number of
intervals into which
labels for the levels of the resulting category. By default,
labels are constructed using
logical, indicating if an ‘x[i]’ equal to
the lowest (or highest, for
logical, indicating if the intervals should be closed on the right (and open on the left) or vice versa.
integer which is used when labels are not given. It determines the number of digits used in formatting the break numbers.
logical: should the result be an ordered factor?
further arguments passed to or from other methods.
breaks is specified as a single number, the range of the
data is divided into
breaks pieces of equal length, and then
the outer limits are moved away by 0.1% of the range to ensure that
the extreme values both fall within the break intervals. (If
is a constant vector, equal-length intervals are created, one of
which includes the single value.)
labels parameter is specified, its values are used to name
the factor levels. If none is specified, the factor level labels are
"(b2, b3]" etc. for
right = TRUE and as
"[b1, b2)", ... if
In this case,
dig.lab indicates the minimum number of digits
should be used in formatting the numbers
A larger value (up to 12) will be used if needed to distinguish
between any pair of endpoints: if this fails labels such as
"Range3" will be used. Formatting is done by
The default method will sort a numeric vector of
other methods are not required to and
labels will correspond to
the intervals after sorting.
As from R 3.2.0,
getOption("OutDec") is consulted when labels
are constructed for
labels = NULL.
factor is returned, unless
labels = FALSE which
results in an integer vector of level codes.
Values which fall outside the range of
breaks are coded as
NA, as are
hist(x, br, plot = FALSE) is
more efficient and less memory hungry. Instead of
labels = FALSE),
findInterval() is more efficient.
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.
split for splitting a variable according to a group factor;
quantile for ways of choosing breaks of roughly equal
content (rather than length).
.bincode for a bare-bones version.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
Z <- stats::rnorm(10000) table(cut(Z, breaks = -6:6)) sum(table(cut(Z, breaks = -6:6, labels = FALSE))) sum(graphics::hist(Z, breaks = -6:6, plot = FALSE)$counts) cut(rep(1,5), 4) #-- dummy tx0 <- c(9, 4, 6, 5, 3, 10, 5, 3, 5) x <- rep(0:8, tx0) stopifnot(table(x) == tx0) table( cut(x, b = 8)) table( cut(x, breaks = 3*(-2:5))) table( cut(x, breaks = 3*(-2:5), right = FALSE)) ##--- some values OUTSIDE the breaks : table(cx <- cut(x, breaks = 2*(0:4))) table(cxl <- cut(x, breaks = 2*(0:4), right = FALSE)) which(is.na(cx)); x[is.na(cx)] #-- the first 9 values 0 which(is.na(cxl)); x[is.na(cxl)] #-- the last 5 values 8 ## Label construction: y <- stats::rnorm(100) table(cut(y, breaks = pi/3*(-3:3))) table(cut(y, breaks = pi/3*(-3:3), dig.lab = 4)) table(cut(y, breaks = 1*(-3:3), dig.lab = 4)) # extra digits don't "harm" here table(cut(y, breaks = 1*(-3:3), right = FALSE)) #- the same, since no exact INT! ## sometimes the default dig.lab is not enough to be avoid confusion: aaa <- c(1,2,3,4,5,2,3,4,5,6,7) cut(aaa, 3) cut(aaa, 3, dig.lab = 4, ordered = TRUE) ## one way to extract the breakpoints labs <- levels(cut(aaa, 3)) cbind(lower = as.numeric( sub("\\((.+),.*", "\\1", labs) ), upper = as.numeric( sub("[^,]*,([^]]*)\\]", "\\1", labs) ))