cut.data.frame: Change numeric variables into factors

View source: R/cut.data.frame.R

cut.data.frameR Documentation

Change numeric variables into factors

Description

This function changes numerical columns of a data frame x into factors. For each of these columns, its range is divided into intervals and the values of this column is recoded according to which interval they fall.

For that, cut is applied to each column of x.

Usage

## S3 method for class 'data.frame'
cut(x, breaks, labels = NULL, include.lowest = FALSE, right = TRUE, dig.lab = 3L,
    ordered_result = FALSE, cutcol = NULL, ...)

Arguments

x

data frame (can also be a tibble).

breaks

list or numeric.

  • If breaks is a list, its length is equal to the number of columns in the data frame. It can be:

    • a list of numeric vectors. The j^{th} element corresponds to the column x[, j], and is a vector of two or more unique cut points

    • or a list of single numbers (each greater or equal to 2). breaks[[j]] element gives the number of intervals into which th j^{th} variable of the folder is to be cut. The elements breaks[[j]] corresponding to non-numeric columns must be NULL; if not, there is a warning.

  • If breaks is a numeric vector, it gives the number of intervals into which every column x[, j] is to be cut (see cut).

labels

list of character vectors. If given, its length is equal to the number of columns of x. labels[[j]] gives the labels for the intervals of the j^{th} columns of the data frame. By default, the labels are constructed using "(a,b]" interval notation. If labels = FALSE, simple integer codes are returned instead of a factor.

See cut.

include.lowest

logical, indicating if, for each column x[, j], an x[i, j] equal to the lowest (or highest, for right = FALSE) 'breaks' value should be included (see cut).

right

logical, indicating if the intervals should be closed on the right (and open on the left) or vice versa (see cut).

dig.lab

integer or integer vector, which is used when labels are not given. It determines the number of digits used in formatting the break numbers.

  • If it is a single value, it gives the number of digits for all variables of the folder (see cut).

  • If it is a list of integers, its length is equal to the number of variables, and the j^{th} element gives the number of digits for the j^{th} variable of the folder.

ordered_result

logical: should the results be ordered factors? (see cut)

cutcol

numeric vector: indices of the columns to be converted into factors. These columns must all be numeric. Otherwise, there is a warning.

...

further arguments passed to or from other methods.

Value

A data frame with the same column and row names as x.

If cutcol is given, each numeric column x[, j] whose number is contained in cutcol is replaced by a factor. The other columns are unmodified.

If any column x[, j] whose number is in cutcol is not numeric, it is unmodified.

If cutcol is omitted, every numerical columns are replaced by factors.

Author(s)

Rachid Boumaza, Pierre Santagostini, Smail Yousfi, Sabine Demotes-Mainard

Examples

data("roses")
x <- roses[roses$rose %in% c("A", "B"), c("Sha", "Sym", "Den", "rose")]

cut(x, breaks = 3)
cut(x, breaks = 5)
cut(x, breaks = c(0, 4, 6, 10))
cut(x, breaks = list(c(0, 6, 8, 10), c(0, 5, 7, 10), c(0, 6, 7, 10)))
cut(x, breaks = list(c(0, 6, 8, 10), c(0, 5, 7, 10)), cutcol = 1:2)

dad documentation built on Aug. 30, 2023, 5:06 p.m.