labels.data.frame: Extract labels from and set labels for data frames

Description Usage Arguments Details Value Note Author(s) See Also Examples

Description

Labels can be stored to an attribute "variable.labels" using the assignment function. With the extractor function one can assess these labels. Usually, these labels are generated by read.spss in package foreign.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
## S3 method for class 'data.frame'
labels(object, which = NULL, abbreviate = FALSE, ...)

## assign labels
labels(data, which = NULL) <- value

## check if data.frame is a special labeled.data.frame
is.labeled.data.frame(object)
## set as.labeled.data.frame
as.labeled.data.frame(object, ...)

## special functions for labeled.data.frame objects that keep the labels
## S3 method for class 'labeled.data.frame'
x[..., drop = TRUE]
## S3 method for class 'labeled.data.frame'
subset(x, ...)
## S3 method for class 'labeled.data.frame'
cbind(..., deparse.level = 1)
## S3 method for class 'labeled.data.frame'
rbind(..., deparse.level = 1)

## special plotting function for labeled.data.frame objects
## S3 method for class 'labeled.data.frame'
plot(x, variables = names(x),
     labels = TRUE, by = NULL, with = NULL,
     regression.line = TRUE, line.col = "red", ...)

Arguments

object

a data.frame or labeled.data.frame. The former is usually a result from read.spss in package foreign, the latter results from adding labels in R or from a call to as.labeled.data.frame.

data

a data.frame or a labeled.data.frame, where labels should be added or altered.

which

either a number indicating the label to extract or a character string with the variable name for which the label should be extracted. One can also use a vector of numerics or character strings to extract mutiple labels. If which is NULL (default), all labels are returned.

value

a vector containing the labels (in the order of the variables). If which is given, only the corresponding subset is labeled. Note that all other labels contain the variable name as label afterwards.

abbreviate

logical (default: FALSE). If TRUE variable labels are abbreviated such that they remain unique. See abbreviate for details. Further arguments to abbreviate can be specified (see below).

...

further options passed to function abbreviate if argument abbreviate = TRUE.

In x[...], ... can be used to specify indices for extraction. See [ for details.

In plot, ... can be used to specify further graphial parameters.

x

a labeled.data.frame.

drop

logical (default: TRUE). If TRUE the result is coerced to the lowest possible dimension (i.e. a vector in case of a single column) and labels might be dropped in this case.

deparse.level

see cbind.

variables

character vector or numeric vector defining (continuous) variables that should be included in the table. Per default, all numeric and factor variables of data are used.

labels

labels for the variables. If labels = TRUE (the default), labels(data, which = variables) is used as labels. If labels = NULL variables is used as label. labels can also be specified as character vector.

by

a character or numeric value specifying a variable in the data set. This variable can be either a grouping factor or is used as numeric y-variable (see with for details). Per default no grouping is applied. See also ‘Details’ and ‘Examples’.

with

a character or numeric value specifying a numeric variable with which to “correlate” all variables specified in variables. For numeric variables a scatterplot is plotted, for factor variables one gets a grouped boxplot. Per default no variable is given here. Instead of with one can also specify a numeric variable in by with the same results. See also ‘Details’ and ‘Examples’.

regression.line

a logical argument specifying if a regression line should be added to scatter plots (which are plotted if both variables and by are numeric values).

line.col

the color of the regression line.

Details

One can set or extract labels from data.frame objects. If no labels are specified labels(data) returns the column names of the data frame. If labels are set (attached to a data.frame) the data.frame gets a special class labeled.data.frame with specific subset and combination functions.

Using abbreviate = TRUE, all labels are abbreviated to (at least) 4 characters such that they are unique. Other minimal lengths can specified by setting minlength (see examples below).

Univariate plots can be easily obtained for all numeric and factor variables in a data set data by using plot(data).

Bivariate plots can be obtained by specifying by. In case of a factor variable, grouped boxplots or spineplots are plotted depending on the class of the variable specified in variables. In case of a numeric variable, grouped boxplots or scatter plots are plotted depending on the class of the variable specified in variables. Note that one cannot specify by and with at the same time (as they are internally identical). Note that missings are excluded plot wise (also for bivariate plots).

Value

labels(data) returns a named vector of variable labels, where the names match the variable names and the values represent the labels.

Note

If you import data using read.spss, labels are set but the data.frame is not coerced to a labeled.data.frame. Use as.labeled.data.frame in this case or do any manipulation of labels using the asignment function (see examples below).

Author(s)

Benjamin Hofner

See Also

read.spss in package foreign

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
############################################################
### Basic labels manipulations

data <- data.frame(a = 1:10, b = 10:1, c = rep(1:2, 5))
labels(data)  ## only the variable names
is.labeled.data.frame(data) ## not yet

## now set labels
labels(data) <- c("my_a", "my_b", "my_c")
## one gets a named character vector of labels
labels(data)
## data is now a labeled.data.frame:
is.labeled.data.frame(data)

## Altervatively one could use as.labeled.data.frame(data);
## This would keep the default labels but set the class
## correctly.

## set labels for a and b only
## Note that which represents the variable names!
labels(data, which = c("a", "b")) <- c("x", "y")
labels(data)

## reset labels (to variable names):
labels(data) <- NULL
labels(data)

## set label for a only and use default for other labels:
labels(data, which = "a") <- "x"
labels(data)

## attach label for new variable:
data2 <- data
data2$z <- as.factor(rep(2:3, each = 5))
labels(data2)  ## no real label for z, only variable name
labels(data2, which = "z") <- "new_label"
labels(data2)


############################################################
### Abbreviate labels

## attach long labels to data
labels(data) <- c("This is a long label", "This is another long label",
                  "This also")
labels(data)
labels(data, abbreviate = TRUE, minlength = 10)


############################################################
### Data manipulations

## reorder dataset:
tmp <- data2[, c(1, 4, 3, 2)]
labels(tmp)
## labels are kept and order is updated
## (but only if data.set has class "labeled.data.frame")

## subsetting to single variables:
labels(tmp[, 2])  ## label got lost as tmp drops to vector
labels(tmp[, 2, drop = FALSE]) ## prevent dropping labels

## one can also cbind labeled.data.set objects:
labels(cbind(data, tmp[, 2, drop = TRUE]))
## or better:
labels(cbind(data, tmp[, 2, drop = FALSE]))
## or rbind labeled.data.set objects:
labels(rbind(data, tmp[, -2]))


############################################################
### Plotting labeled.data.set objects

## plot the data auto"magically"; numerics as boxplot, factors as barplots
par(mfrow = c(2,2))
plot(data2)

## a single plot
plot(data2, variables = "a")
## grouped plot
plot(data2, variables = "a", by = "z")
## make "c" a factor and plot "c" vs. "z"
data2$c <- as.factor(data2$c)
plot(data2, variables = "c", by = "z")
## the same
plot(data2, variables = 3, by = 4)

## plot everithing against "b"
## (grouped boxplots, stacked barplots or scatterplots)
plot(data2, with = "b")

papeR documentation built on May 2, 2019, 4:55 p.m.