closure_cut: Interval Closure Classification

Description Usage Arguments Value Note Examples

Description

interval categorization capable of handling each interval's boundary closures independently. This is designed to be used when base::cut does not fully meet your needs, and is a wrapper for cut when breaks are not named (see breaks). Unlike cut, gaps are permissible in consecutive intervals, but will generate NAs.

Usage

1
2
closure_cut(x, breaks, label_vec = NULL, dig_lab = 3L,
  ordered_result = FALSE, env = parent.frame())

Arguments

x

A numeric or integer vector to be categorized. Factors are coerced to integers.

breaks

Closure is determined by 'i' and 'e' for include and exclude, respectively. If an interval has breaks c(i, i), the start of the next interval is going to be (e, ?) since any other choice would create a hole at that point in the range. If breaks are not named, the function defaults to base::cut. Breaks that leave gaps will result in NA values. If you name breaks with 'i' and 'e', do so consistently or you will be redirected to base::cut with a warning.

label_vec

The labels for the breaks. Order and length of labels must be consistent with breaks. Default null causes labels to be based on break intervals.

dig_lab

the desired number of digits after the decimal point (format = "f") or significant digits (format = "g", = "e" or = "fg"). Default: 2 for integer, 4 for real numbers. If less than 0, the C default of 6 digits is used. If specified as more than 50, 50 will be used with a warning unless format = "f" where it is limited to typically 324. (Not more than 15-21 digits need be accurate, depending on the OS and compiler used. This limit is just a precaution against segfaults in the underlying C runtime.)

ordered_result

single logical value; should result be order? default FALSE.

env

the environment of base::cut, if that is triggered. Default parent.frame().

Value

a factored vector; x is classified based on user inputs.

Note

Very large and very small numbers (less than 1e-12, greater than 1e16) may not work, use at your own risk, or, transform your data with a shift parameter to a safe input range.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
# Example boundary vector
boundary_test_vec <- c(-1, 0, 199, 200, 200.1, 239, 240, 240.1, 255, 500)

#' # Example 1 - ordered vector with custom label
closure_cut(x = boundary_test_vec,
 breaks = c(i=0, ei = 200, ie = 240, e = Inf),
 ordered_result = TRUE, label_vec = Cs(best, borderline, poor))

# Example 2 - vector with no order nor custom label
closure_cut(x = boundary_test_vec,
 breaks = c(i=0, ei = 200, ie = 240, e = Inf),
 ordered_result = FALSE, label_vec = NULL)

# Example boundary data.table (data.frame should be similar)
d <- data.table(
  chol = sample(150:400, size = 1e3, replace = TRUE))
breaks  <-  c(i = 0, e = 200, i = 240, e = Inf)

# Example 3 - data.table
d[, cat := closure_cut(chol, breaks)]
d[between(chol, 0, 200-1e-3), unique(cat)]
d[between(chol, 200, 240), unique(cat)]
d[between(chol, 240+1e-3, 9e10), unique(cat)]

# Example 4 - single point with ordered, custom labels
closure_cut(x = 200, breaks = c(i=0, ei = 200, ie = 240, e = Inf),
ordered_result = TRUE, label_vec = Cs(best, borderline, poor))

# Example 5 - single point without order and custom labels
closure_cut(x = 200, breaks = c(i=0, ei = 200, ie = 240, e = Inf),
 ordered_result = FALSE, label_vec = NULL)

# Example 6 - single point with ordered, default labels
closure_cut(x = 200, breaks = c(i=0, ei = 200, ie = 240, e = Inf),
   ordered_result = TRUE, label_vec = NULL)

# Example 7: data.table most recommended way to use
d <- data.table(
  chol = sample(150:400, size = 1e3, replace = TRUE))
# breaks  <-  c(i = 0, ie = 200, ie = 240, e = Inf)
breaks  <-  c(i = 0, e = 200, i = 240, e = Inf)
d[, cat := closure_cut(chol, breaks)]
print(d)

# gaps will generate NA values, consistent with cut
closure_cut(1, breaks = c(10, 20)) # base::cut applied
closure_cut(x=1, breaks = c(i = 10, i = 20))

# Not recommended, but possible...
d[, chol, by = .(closure_cut(chol, breaks))]

## Not run: 
# BAD - will error
# error: too many labels
closure_cut(1, breaks = c(i=0, i=0, i = 1), label_vec = c("zero", "one", "two"))
two errors: break 4 misnamed, break 5 not named
closure_cut(x = 1, breaks = c(i=1,e = 2, i = 3, eerie = 4, 5))

# create an intentional gap
# You can force a gap, but we do not support/advise. Picking your boundary
# points depends on the smallest step between points in your dataset.
# works but potentially dangerous - make your own safeguards!
closure_cut(10, breaks = c(e=1, e=10-1e-10, i=10+1e-10, e=20))

# if you really want a gap, this approach is safer:
test_gap <- closure_cut(c(1, 10, 15, 20),
 breaks = c(i=1, e=10-1e-10, i=10+1e-10, e=20),
 label_vec = c("a", "gap", "b"))
print(test_gap)
test_gap[test_gap == "gap"] <- NA
print(test_gap)

## End(Not run)

JamesDalrymple/wccmh documentation built on May 7, 2019, 10:20 a.m.