acut | R Documentation |
cut
A version of cut
that easily formats the labels and places breaks by default.
acut( x, n = 5, type = "default", format = NULL, format.low = NULL, format.high = NULL, dig.lab = 3, right = TRUE, breaks, labels = TRUE, ... )
x |
a numeric vector which is to be converted to a factor by cutting (passed directly to |
n |
number of bins to create based on the empirical quantiles of x. This will be overruled if |
type |
a high-level formatting option. For now, the only other option than the default setting is " |
format |
string used to make labels. %l and %u identifies the lower and upper value of the breaks respectively. See examples. |
format.low |
string used specifically on the lowest label. |
format.high |
string used specifically on the highest label. |
dig.lab |
integer which is used when labels are not given. It determines the number of digits used in formatting the break numbers. (Passed directly to |
right |
logical, indicating if the intervals should be closed on the right (and open on the left) or vice versa (passed directly to |
breaks |
specify breaks manually as in |
labels |
logical, indicating whether or not to make labels or simply use ordered numbers. If TRUE, the labels are constructed as discribed above. |
... |
further arguments passed to |
The formats are supplied by specifiyng the text around the lower (%l) and upper (%l) value (see examples).
If user specified breaks are supplied, the default labels from cut
are used.
If automatic breaks are used, the default labels are a slight modification at the end point of the default from cut
All this can of course be adjusted manually through the format functionality (see below).
By default, 5 breaks are constructed according to the quantiles with of the input x
.
The number of breaks can be adjusted, and default specifying breaks (as in cut
) can be supplied instead.
If type
is changed from "default
" to another option, a different formatting template is used.
For now the only other option is "age
", which is designed to be well suited to easily group age variables.
When type
="age
" only the breaks
argument is used, and it behaves different from otherwise.
If a single number is supplied, intervals of length breaks
will automatically be constructed (starting from 0).
If a vector is supplied, the intervals are used as in cut
but formatted differently, see examples.
same as for cut. A vector of 'factors' is created, unless 'labels=FALSE'.
Anders Munch
data(Diabetes) # load dataset ## The default uses format similar to cut chol.groups <- acut(Diabetes$chol) table(chol.groups) ## The formatting can easily be changed chol.groups <- acut(Diabetes$chol,format="%l-%u",n=5) table(chol.groups) ## The default is to automatic place the breaks, so the number of this can easily be changed. chol.groups <- acut(Diabetes$chol,n=7) table(chol.groups) ## Manually setting format and breaks age.groups <- acut(Diabetes$age,format="%l-%u",breaks=seq(0,100,by=10)) table(age.groups) ## Other variations age.groups <- acut(Diabetes$age, format="%l-%u", format.low="below %u", format.high="above %l", breaks=c(0, seq(20,80,by=10), Inf)) table(age.groups) BMI.groups <- acut(Diabetes$BMI, format="BMI between %l and %u", format.low="BMI below %u", format.high="BMI above %l") table(BMI.groups) org(as.data.frame(table(BMI=BMI.groups))) ## Instead of using the quantiles, we can specify equally spaced breaks, ## but still get the same formatting BMI.grouping <- seq(min(Diabetes$BMI,na.rm=TRUE), max(Diabetes$BMI,na.rm=TRUE), length.out=6) BMI.grouping[1] <- -Inf # To get all included BMI.groups <- acut(Diabetes$BMI, breaks=BMI.grouping, format="BMI between %l and %u", format.low="BMI below %u", format.high="BMI above %l") table(BMI.groups) org(as.data.frame(table(BMI=BMI.groups))) ## Using type="age" ## When using type="age", categories of 10 years are constructed by default. ## The are formatted to be easier to read when the values are ages. table(acut(Diabetes$age, type="age")) ## This can be changes with the breaks argument. ## Note that this is diffent from cut when breaks is a single number. table(acut(Diabetes$age, type="age", breaks=20)) ## Of course We can also supply the breaks manually. ## The formatting depends on whether or not all the values fall within the breaks: ## All values within the breaks table(acut(Diabetes$age, type="age", breaks=c(0, 30, 50, 80, 100))) ## Some values below and above the breaks table(acut(Diabetes$age, type="age", breaks=c(30, 50, 80)))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.