compute: Modify data.frame/modify subset of the data.frame

Description Usage Arguments Value Examples

Description

Full-featured %to% is available in the expressions for addressing range of variables. There is a special constant .N which equals to number of cases in data for usage in expression inside compute/calculate. Inside do_if .N gives number of rows which will be affected by expressions. For parametrization (variable substitution) see .. or examples. Sometimes it is useful to create new empty variable inside compute. You can use .new_var function for this task. This function creates variable of length .N filled with NA. See examples. modify is an alias for compute, modify_if is an alias for do_if and calc is an alias for calculate.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
compute(data, ...)

modify(data, ...)

do_if(data, cond, ...)

modify_if(data, cond, ...)

calculate(data, expr, use_labels = FALSE)

use_labels(data, expr)

calc(data, expr, use_labels = FALSE)

data %calc% expr

data %use_labels% expr

data %calculate% expr

Arguments

data

data.frame/list of data.frames. If data is list of data.frames then expression expr will be evaluated inside each data.frame separately.

...

expressions that should be evaluated in the context of data.frame data. It can be arbitrary code in curly brackets or assignments. See examples.

cond

logical vector or expression. Expression will be evaluated in the context of the data.

expr

expression that should be evaluated in the context of data.frame data

use_labels

logical. Experimental feature. If it equals to TRUE then we will try to replace variable names with labels. So many base R functions which show variable names will show labels.

Value

compute and do_if functions return modified data.frame/list of modified data.frames, calculate returns value of the evaluated expression/list of values.

Examples

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
dfs = data.frame(
    test = 1:5,
    a = rep(10, 5),
    b_1 = rep(11, 5),
    b_2 = rep(12, 5),
    b_3 = rep(13, 5),
    b_4 = rep(14, 5),
    b_5 = rep(15, 5) 
)


# compute sum of b* variables and attach it to 'dfs'
compute(dfs, {
    b_total = sum_row(b_1 %to% b_5)
    var_lab(b_total) = "Sum of b"
    random_numbers = runif(.N) # .N usage
})

# calculate sum of b* variables and return it
calculate(dfs, sum_row(b_1 %to% b_5))


# set values to existing/new variables
compute(dfs, {
    (b_1 %to% b_5) %into% text_expand('new_b{1:5}')
})

# .new_var usage
compute(dfs, {
    new_var = .new_var()
    new_var[1] = 1 # this is not possible without preliminary variable creation
})

# conditional modification
do_if(dfs, test %in% 2:4, {
    a = a + 1    
    b_total = sum_row(b_1 %to% b_5)
    random_numbers = runif(.N) # .N usage
})


# variable substitution
name1 = "a"
name2 = "new_var"

compute(dfs, {
     ..$name2 = ..$name1*2    
})

compute(dfs, {
     for(name1 in paste0("b_", 1:5)){
         name2 = paste0("new_", name1) 
         ..$name2 = ..$name1*2 
     }
     rm(name1, name2) # we don't need this variables as columns in 'dfs'
})

# square brackets notation
compute(dfs, {
     ..[(name2)] = ..[(name1)]*2  
})

compute(dfs, {
     for(name1 in paste0("b_", 1:5)){
         ..[paste0("new_", name1)] = ..$name1*2 
     }
     rm(name1) # we don't need this variable as column in 'dfs'
})

# '..$' doesn't work for case below so we need to use square brackets form
name1 = paste0("b_", 1:5)
name2 = paste0("new_", name1)
compute(dfs, {
     for(i in 1:5){
         ..[name2[i]] = ..[name1[i]]*3
     }
     rm(i) # we don't need this variable as column in 'dfs'
})

# 'use_labels' examples. Utilization of labels in base R.
data(mtcars)
mtcars = apply_labels(mtcars,
                      mpg = "Miles/(US) gallon",
                      cyl = "Number of cylinders",
                      disp = "Displacement (cu.in.)",
                      hp = "Gross horsepower",
                      drat = "Rear axle ratio",
                      wt = "Weight (lb/1000)",
                      qsec = "1/4 mile time",
                      vs = "Engine",
                      vs = c("V-engine" = 0,
                             "Straight engine" = 1),
                      am = "Transmission",
                      am = c("Automatic" = 0,
                             "Manual"=1),
                      gear = "Number of forward gears",
                      carb = "Number of carburetors"
)

use_labels(mtcars, table(am, vs))

## Not run: 
use_labels(mtcars, plot(mpg, hp))

## End(Not run)

mtcars %>% 
       use_labels(lm(mpg ~ disp + hp + wt)) %>% 
       summary()

expss documentation built on Jan. 8, 2021, 5:38 p.m.