discretize: Discretize continuous data.

Description Usage Arguments Details Author(s) Examples

Description

Converts continuous columns to discrete.

Usage

1
    discretize(dataset, input = NULL, ndigs=2, nlevels=10, presumedFactor=FALSE)

Arguments

dataset

Dataset to discretize, data frame/table.

input

Optional specification for partitioning, giving the number of partitions and labels for each partition. List of lists, one list per column to be converted. The outermost list indicates the columns to be converted, and each inner list holds the name of the column, the number of partitions, and a list of labels for each partition.

ndigs

Number of digits to retain in forming labels/values for the discretized data, if input is not supplied. E.g. if ndigs is 2 and the original datum is 38.12, it becomes 38.

nlevels

Number of partitions to form for each variable, if input is NULL.

presumedFactor

If TRUE, any variable having fewer than nlevels levels will be presumed to be an informal factor, and thus will not be discretized.

Details

If input is not specified, each numeric column in the data will be discretized, with one exception: If a column is numeric but has fewer distinct values than nlevels, and if presumedFactor is TRUE, it is presumed to be an informal R factor and will not be converted. However, it is best to use makeFactor on such variables.

Author(s)

Norm Matloff <matloff@cs.ucdavis.edu>, Vincent Yang <vinyang@ucdavis.edu>, and Harrison Nguyen <hhnguy@ucdavis.edu>

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
    data(prgeng)
    pe <- prgeng[,c(1,3,5,7:9)]  # extract vars of interest
    pe25 <- pe[pe$wageinc < 250000,]  # delete extreme values
    pe25disc <- discretize(pe25)  # age, wageinc and wkswrkd discretized

    data(mlb)
    # extract the height, weight, age, and position of players
    m <- mlb[,4:7]

    inp1 <- list("name" = "Height",
                 "partitions"=4,
                 "labels"=c("short", "shortmid", "tallmid", "tall"))

    inp2 <- list("name" = "Weight",
                 "partitions"=3,
                 "labels"=c("light", "med", "heavy"))

    inp3 <- list("name" = "Age",
                 "partitions"=3,
                 "labels"=c("young", "med", "old"))

    # create one list to pass everything to discretize()
    discreteinput <- list(inp1, inp2, inp3)
    head(discreteinput)

    # at this point, all of the data has been discretized
    discretizedmlb <- discretize(m, discreteinput)
    head(discretizedmlb)

cdparcoord documentation built on Aug. 5, 2019, 1:02 a.m.