extract: Extract aggregated values and/or metadata

Description Usage Arguments Details Value Author(s) See Also Examples

Description

Extract selected aggregated and/or discretised values into common matrix or data frame. The extract data-frame method conducts normalisation and/or computes normalised point-estimates and respective confidence intervals for user-defined experimental groups. It is mainly a helper function for ci_plot. extract_columns extracts only selected metadata entries for use as additional columns in a data frame or (after joining) as character vector with labels.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
  ## S4 method for signature 'MOPMX'
extract(object, as.labels,
    subset = opm_opt("curve.param"), ci = FALSE, trim = "full",
    dataframe = FALSE, as.groups = NULL, sep = " ", ...) 
  ## S4 method for signature 'OPMS'
extract(object, as.labels,
    subset = opm_opt("curve.param"), ci = FALSE, trim = "full",
    dataframe = FALSE, as.groups = NULL, sep = " ", dups = "warn",
    exact = TRUE, strict = TRUE, full = TRUE, max = 10000L, ...) 
  ## S4 method for signature 'data.frame'
extract(object, as.groups = TRUE,
    norm.per = c("row", "column", "none"), norm.by = TRUE, subtract = TRUE,
    direct = inherits(norm.by, "AsIs"), dups = c("warn", "error", "ignore"),
    split.at = param_names("split.at")) 

  ## S4 method for signature 'WMD'
extract_columns(object, what, join = FALSE,
    sep = " ", dups = c("warn", "error", "ignore"), factors = TRUE,
    exact = TRUE, strict = TRUE) 
  ## S4 method for signature 'WMDS'
extract_columns(object, what, join = FALSE,
    sep = " ", dups = c("warn", "error", "ignore"), factors = TRUE,
    exact = TRUE, strict = TRUE) 
  ## S4 method for signature 'data.frame'
extract_columns(object, what,
    as.labels = NULL, as.groups = NULL, sep = opm_opt("comb.value.join"),
    factors = is.list(what), direct = is.list(what) || inherits(what, "AsIs")) 

Arguments

object

OPMS object, MOPMX object or data frame, for extract with one column named as indicated by split.at (default given by param_names("split.at")), columns with factor variables before that column and columns with numeric vectors after that column. For extract_columns optionally an OPM object.

as.labels

List, character vector or formula indicating the metadata to be joined and used as row names (if dataframe is FALSE) or additional columns (if otherwise). Ignored if NULL.

If a as.labels is a formula and dataframe is TRUE, the pseudo-function J within the formula can be used to trigger combination of factors immediately after selecting them as data-frame columns, much like as.groups.

subset

Character vector. The parameter(s) to put in the matrix. One of the values of param_names(). Alternatively, if it is param_names("disc.name"), discretised data are returned, and ci is ignored. Can also be identical to param_names("hours"), which yields the overall running time (see hours), also ignoring ci.

ci

Logical scalar. Also return the confidence intervals?

trim

Character scalar. See aggregated for details.

dataframe

Logical scalar. Return data frame or matrix? In the case of the MOPMX method this can also be NA and then behaves like TRUE but ensures that all rows are kept.

as.groups

For the OPMS method, a list, character vector or formula indicating the metadata to be joined and either used as ‘row.groups’ attribute of the output matrix or as additional columns of the output data frame. See heat_map for its usage. Ignored if empty.

If a as.groups is a formula and dataframe is TRUE, the pseudo-function J within the formula can be used to trigger combination of factors immediately after selecting them as data-frame columns, much like as.labels.

If as.groups is a logical scalar, TRUE yields a trivial group that contains all elements, FALSE yields one group per element, and NA yields an error. The column name in which this factor is placed if dataframe is TRUE is determined using opm_opt("group.name").

For the data-frame method, a logical, character or numeric vector indicating according to which columns (before the split.at column) the data should be aggregated by calculating means and confidence intervals. If FALSE, such an aggregation does not take place. If TRUE, all those columns are used for grouping.

sep

Character scalar. Used as separator between the distinct metadata entries if these are to be pasted together. extract_columns ignores this unless join is TRUE. The data-frame method always joins the data unless what is a list.

dups

Character scalar specifying what to do in the case of duplicate labels: either ‘warn’, ‘error’ or ‘ignore’. Ignored unless join is TRUE and if object is an OPM object. For the data-frame method of extract, a character scalar defining the action to conduct if as.groups contains duplicates.

exact

Logical scalar. Passed to metadata.

strict

Logical scalar. Also passed to metadata.

full

Logical scalar indicating whether full substrate names shall be used. This is passed to wells, but in contrast to what flatten is doing the argument here refers to the generation of the column names.

max

Numeric scalar. Passed to wells.

...

Optional other arguments passed to wells.

norm.per

Character scalar indicating the presence and direction of a normalisation step.

none

No normalisation.

row

Normalisation per row. By default, this would subtract the mean of each plate from each of its values (over all wells of that plate).

column

Normalisation per column. By default, this would subtract the mean of each well from each of its values (over all plates in which this well is present).

This step can further by modified by the next three arguments.

norm.by

Vector indicating which wells (columns) or plates (rows) are used to calculate means used for the normalisation. By default, the mean is calculated over all rows or columns if normalisation is requested using norm.per. But if direct is TRUE, norm.by is directly interpreted as numeric vector used for normalisation.

direct

Logical scalar. For extract, indicating how to use norm.by. See there for details. For extract_columns, indicating whether to extract column names directly, or search for columns of one to several given classes.

subtract

Logical scalar indicating whether normalisation (if any) is done by subtracting or dividing.

split.at

Character vector defining alternative names of the column at which the data frame shall be divided. Exactly one must match.

what

For the OPMS method, a list of metadata keys to consider, or single such key; passed to metadata. A formula is also possible; see there for details. A peculiarity of extract_columns is that including J as a pseudo-function call in the formula triggers the combination of metadata entries to new factors immediately after selecting them, as long as join is FALSE.

For the data-frame method, just the names of the columns to extract, or their indexes, as vector, if direct is TRUE. Alternatively, the name of the class to extract from the data frame to form the matrix values.

In the ‘direct’ mode, what can also be a named list of vectors used for indexing. In that case a data frame is returned that contains the columns from object together with new columns that result from pasting the selected columns together. If what is named, its names are used as the new column names. Otherwise each name is created by joining the respective value within what with the "comb.key.join" entry of opm_opt as separator.

join

Logical scalar. Join each row together to yield a character vector? Otherwise it is just attempted to construct a data frame.

factors

Logical scalar determining whether strings should be converted to factors. Note that this would only affect newly created data-frame columns.

Details

extract_columns is not normally directly called by an opm user because extract is available, which uses this function, but can be used for testing the applied metadata selections beforehand.

The extract_columns data-frame method is partially trivial (extract the selected columns and join them to form a character vector or new data-frame columns), partially more useful (extract columns with data of a specified class).

Not all MOPMX objects are suitable for extract. The call will be successful if only OPMS objects are contained, i.e. OPM objects are forbidden. But even if successful it might result in NA values within the resulting matrix or data frame. This may cause methods that call extract to fail. NA values will not occur if the set of row names created using as.labels is equal between the distinct elements of object. The also holds if dataframe is TRUE, even though in that case row names are only temporarily created.

Duplicate combinations of row and columns names currently cause the MOPMX methods to skip all of them except the last one if dataframe is FALSE. This should mainly effect substrates that occur in plates of distinct plate types.

Similarly, duplicate row names will cause the skipping of all but the last one. This can be circumvented by using an as.labels argument that yields unique row names. If as.labels is empty, the MOPMX method of extract will create potentially unique row names from the names if these are present but from the plate types if the ‘names’ attribute is NULL. This will not be done, and rows will neither be skipped nor reordered, if dataframe is TRUE.

Otherwise row names and names of substrate columns will be reordered (sorted). The created ‘row.groups’ attribute, if any, will be adapted accordingly. If dataframe is TRUE, the placement of the columns created by as.groups will also be as usual, but duplicates, if any, will be removed.

Value

Numeric matrix or data frame from extract; always a data frame for the data-frame method with the same column structure as object and, if grouping was used, a triplet structure of the rows, as indicated in the new split.at column: (i) group mean, (ii) lower and (iii) upper boundary of the group confidence interval. The data could then be visualised using ci_plot. See the examples.

For the OPMS method of extract_columns, a data frame or character vector, depending on the join argument. The data-frame method of extract_columns returns a character vector or a data frame, too, but depending on the what argument.

Author(s)

Lea A.I. Vaas, Markus Goeker

See Also

aggregated for the extraction of aggregated values from a single OPMA objects.

boot::norm base::data.frame base::as.data.frame base::matrix base::as.matrix base::cbind

Other conversion-functions: as.data.frame, flatten, merge, oapply, opmx, plates, rep, rev, sort, split, to_yaml, unique

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
## 'OPMS' method
opm_opt("curve.param") # default parameter

# generate matrix (containing the parameter given above)
(x <- extract(vaas_4, as.labels = list("Species", "Strain")))[, 1:3]
stopifnot(is.matrix(x), dim(x) == c(4, 96), is.numeric(x))
# using a formula also works
(y <- extract(vaas_4, as.labels = ~ Species + Strain))[, 1:3]
stopifnot(identical(x, y))

# generate data frame
(x <- extract(vaas_4, as.labels = list("Species", "Strain"),
  dataframe = TRUE))[, 1:3]
stopifnot(is.data.frame(x), dim(x) == c(4, 99))
# using a formula
(y <- extract(vaas_4, as.labels = ~ Species + Strain,
  dataframe = TRUE))[, 1:3]
stopifnot(identical(x, y))
# using a formula, with joining into new columns
(y <- extract(vaas_4, as.labels = ~ J(Species + Strain),
  dataframe = TRUE))[, 1:3]
stopifnot(identical(x, y[, -3]))

# put all parameters in a single data frame
x <- lapply(param_names(), function(name) extract(vaas_4, subset = name,
  as.labels = list("Species", "Strain"), dataframe = TRUE))
x <- do.call(rbind, x)

# get discretised data
(x <- extract(vaas_4, subset = param_names("disc.name"),
  as.labels = list("Strain")))[, 1:3]
stopifnot(is.matrix(x), identical(dim(x), c(4L, 96L)), is.logical(x))

## data-frame method

# extract data from OPMS-object as primary data frame
# second call to extract() then applied to this one
(x <- extract(vaas_4, as.labels = list("Species", "Strain"),
  dataframe = TRUE))[, 1:3]

# no normalisation, but grouping for 'Species'
y <- extract(x, as.groups = "Species", norm.per = "none")
# plotting using ci_plot()
ci_plot(y[, c(1:6, 12)], legend.field = NULL, x = 350, y = 1)

# normalisation by plate means
y <- extract(x, as.groups = "Species", norm.per = "row")
# plotting using ci_plot()
ci_plot(y[, c(1:6, 12)], legend.field = NULL, x = 130, y = 1)

# normalisation by well means
y <- extract(x, as.groups = "Species", norm.per = "column")
# plotting using ci_plot()
ci_plot(y[, c(1:6, 12)], legend.field = NULL, x = 20, y = 1)

# normalisation by subtraction of the well means of well A10 only
y <- extract(x, as.groups = "Species", norm.per = "row", norm.by = 10,
  subtract = TRUE)
# plotting using ci_plot()
ci_plot(y[, c(1:6, 12)], legend.field = NULL, x = 0, y = 0)

## extract_columns()

# 'OPMS' method

# Create data frame
(x <- extract_columns(vaas_4, what = list("Species", "Strain")))
stopifnot(is.data.frame(x), dim(x) == c(4, 2))
(y <- extract_columns(vaas_4, what = ~ Species + Strain))
stopifnot(identical(x, y)) # same result using a formula
(y <- extract_columns(vaas_4, what = ~ J(Species + Strain)))
stopifnot(is.data.frame(y), dim(y) == c(4, 3)) # additional column created
stopifnot(identical(x, y[, -3]))
(x <- extract_columns(vaas_4, what = TRUE)) # use logical scalar
stopifnot(is.data.frame(x), dim(x) == c(4, 1))
(y <- extract_columns(vaas_4, what = FALSE))
stopifnot(is.data.frame(y), dim(y) == c(4, 1), !all(y[, 1] == x[, 1]))

# Create a character vector
(x <- extract_columns(vaas_4, what = list("Species", "Strain"), join = TRUE))
stopifnot(is.character(x), length(x) == 4L)
(x <- try(extract_columns(vaas_4, what = list("Species"), join = TRUE,
  dups = "error"), silent = TRUE)) # duplicates yield error
stopifnot(inherits(x, "try-error"))
(x <- try(extract_columns(vaas_4, what = list("Species"), join = TRUE,
  dups = "warn"), silent = TRUE)) # duplicates yield warning only
stopifnot(is.character(x), length(x) == 4L)

# data-frame method, 'direct' running mode
x <- data.frame(a = 1:26, b = letters, c = LETTERS)
(y <- extract_columns(x, I(c("a", "b")), sep = "-"))
stopifnot(grepl("^\\s*\\d+-[a-z]$", y)) # pasted columns 'a' and 'b'

# data-frame method, using class name
(y <- extract_columns(x, as.labels = "b", what = "integer", as.groups = "c"))
stopifnot(is.matrix(y), dim(y) == c(26, 1), rownames(y) == x$b)
stopifnot(identical(attr(y, "row.groups"), x$c))

opm documentation built on May 2, 2019, 6:08 p.m.