curateVtoDF | R Documentation |
Curate vector into a data.frame
curateVtoDF(
x,
curationL = NULL,
matchWholeString = TRUE,
trimWhitespace = TRUE,
whitespace = "_ ",
expandWhitespace = TRUE,
previous = NULL,
verbose = TRUE,
...
)
x |
character vector as input |
curationL |
list containing curation rules, as described above, or
a character vector of yaml files, which will be imported into
a list format using |
matchWholeString |
logical indicating whether to match the whole
string for each entry in |
trimWhitespace |
logical indicating whether to trim leading and
trailing whitespace characters from |
whitespace |
character vector containing whitespace characters. |
expandWhitespace |
logical indicating whether substitution patterns
should be modified so any whitespace characters in the pattern will
match the defined |
previous |
optional data.frame whose colnames may be present as
names in |
verbose |
logical indicating whether to print verbose output. |
... |
additional arguments are ignored. |
This function is intended to curate a vector into a data.frame with specifically assigned colnames. It is intended to be a more generic method of curation annotations than splitting a characteer string by some delimiter, for example where the order of annotations may differ entry to entry, but where there are known patterns that are sufficient to describe an annotation column.
That said, if annotations can be reliably split using a delimiter, that method is often a better choice. In that case, this function may be useful to make input data fit the expected format.
For example from c("Sample1_WT_LPS_1hour", "Sample2_KO_LPS_2hours")
we can tell whether a sample is KO
or WT
by looking for that
substring.
The curationL
is a list with the following properties:
names(curationL)
represent colnames to create in the output
data.frame.
each list element contains a list of two-element vectors
each two-element vector contains a substitution pattern and substitution replacement
When matchWholeString=TRUE
the substitution patterns are extended
to match the whole string, using parentheses around the main pattern.
For example if the pattern is "KO" and replacement is "KO", then the
pattern is extended to "^.KO.$", so the entire string will be
replaced with "KO".
Typically, curationL
is derived from YAML formatted files, and
loaded into a list with this type of setup:
curationL <- yaml::yaml.load_file("curation.yaml")
.
The generic YAML format is as follows:
NewColname_1: - - patternA - replacementA - - patternB - replacementB NewColname_2: - - patternC - replacementC
A specific example:
Treatment: - - LPS - LPS - - Control|cntrl|ctrl - Control Genotype: - - WT|wildtype - WT - - KO|knockout|knock - KO
Other jam design functions:
curateDFtoDF()
,
groups2contrasts()
set.seed(123);
x <- paste(
paste0("file",
sapply(1:5, function(i) {
paste(sample(LETTERS, 5), collapse="")
})),
rep(c("WT", "Mut"), each=3),
rep(c("Veh","EtOH"), 3),
sep="_");
x;
curationYaml <- c(
"Genotype:
- - WT|wildtype
- WT
- - Mut|mutant
- Mut
Treatment:
- - Veh|EtOH
- \\1
File:
- - file([A-Z]+)
- \\1
FileStem:
- - file([A-Z]+)
- \\2");
# print the curation.yaml to show its structure
cat(curationYaml)
curationL <- yaml::yaml.load(curationYaml);
curateVtoDF(x, curationL);
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.