knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) library(dunlin)
Reformatting in dunlin
consists in replacing predetermined values by another in particular variables for selected
tables of a data set stored.
This is performed in two steps:
A Reformatting Map (rule
object) is created which specifies the correspondence between the old and the new values
The reformatting itself is performed with the reformat()
function.
The Reformatting Map is a rule
object inheriting from character
.
Its names are the new values to be used, and its values are the old values to be used.
rule(A = "a", B = c("c", "d"))
This rule will replace "a" with "A", replace "c" or "d" with "B".
reformat
reformat
is a generic supports reformatting of character
or factor
. Reformatting for
other types of variables is meaningless. reformat
will also preserve the attributes of the
original data, e.g. the data type or labels will be unchanged.
An example of reformatting character
can be
r <- rule(A = "a", B = c("c", "d")) reformat(c("a", "c", "d", NA), r)
We can see that the NA
values are not changed.
Now we test the factor reformatting:
r <- rule(A = "a", B = c("c", "d")) reformat(factor(c("a", "c", "d", NA)), r)
The NA
values are also not changed.
However, if we including reformatting for the NA
, there is something different:
r <- rule(A = "a", C = NA, B = c("c", "d")) reformat(factor(c("a", "c", "d", NA)), r)
By default, the level replacing NA
is set as the last one. This can be changed by setting .na_last = FALSE
.
r <- rule(A = "a", C = NA, B = c("c", "d")) reformat(factor(c("a", "c", "d", NA)), r, .na_last = FALSE)
For list
of data.frames
, the format
argument is actually a nested list of rule.
The first layer indicates the table names, the second layer indicates the variables in that table.
Reformatting is only available for columns of characters or factors, reformatting columns of another types will result in a warning.
df1 <- data.frame( "char" = c("", "b", NA, "a", "k", "x"), "fact" = factor(c("f1", "f2", NA, NA, "f1", "f1"), levels = c("f2", "f1")), "logi" = c(NA, FALSE, TRUE, NA, FALSE, NA) ) df2 <- data.frame( "char" = c("a", "b", NA, "a", "k", "x"), "fact" = factor(c("f1", "f2", NA, NA, "f1", "f1")) ) db <- list(df1 = df1, df2 = df2) attr(db$df1$char, "label") <- "my label" rule_map <- list( df1 = list( char = rule("Empty" = "", "B" = "b", "Not Available" = NA), fact = rule("F1" = "f1"), logi = rule() ), df2 = list( char = rule("Empty" = "", "A" = "a", "Not Available" = NA) ) ) res <- reformat(db, rule_map, .na_last = TRUE) res
The behavior of a rule can be further refined using special mapping values.
* .to_NA
convert the specified character to NA
at the end of the process.
r <- rule(A = "a", B = c("c", "d"), .to_NA = c("x")) reformat(c("a", "c", "d", NA, "x"), r)
.drop
specifies whether unused levels should be dropped.# With drop = FALSE obj <- factor(c("a", "c", "d", NA), levels = c("d", "c", "a", "Not used")) r <- rule(A = "a", B = c("c", "d")) reformat(obj, r) # With drop = TRUE obj <- factor(c("a", "c", "d", NA), levels = c("d", "c", "a", "Not used")) r <- rule(A = "a", B = c("c", "d"), .drop = TRUE) reformat(obj, r)
Note that behavior of the rule can be overridden using the corresponding arguments in reformat
.
r <- rule(A = "a", B = c("c", "d"), .to_NA = c("x"), .drop = TRUE) obj <- factor(c("a", "c", "d", NA, "x", "y"), levels = c("d", "c", "a", "Not used", "x", "y")) reformat(obj, r) # Override reformat(obj, r, .to_NA = "y", .drop = FALSE)
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.