dataUpdate | R Documentation |
Function to update the local data records by reading the yaml files in the specified directory recursively.
dataUpdate(
dir,
cachePath = "ReUseData",
outMeta = FALSE,
keepTags = TRUE,
cleanup = FALSE,
cloud = FALSE,
remote = FALSE,
checkData = TRUE,
duplicate = FALSE
)
dir |
a character string for the directory where all data are saved. Data information will be collected recursively within this directory. |
cachePath |
A character string specifying the name for the
|
outMeta |
Logical. If TRUE, a "meta_data.csv" file will be
generated in the |
keepTags |
If keep the prior assigned data tags. Default is TRUE. |
cleanup |
If remove any invalid intermediate files. Default is
FALSE. In cases one data recipe (with same parameter values)
was evaluated multiple times, the same data file(s) will match
to multiple intermediate files (e.g., .yml). |
cloud |
Whether to return the pregenerated data from Google Cloud bucket of ReUseData. Default is FALSE. |
remote |
Whether to use the csv file (containing information
about pregenerated data on Google Cloud) from GitHub, which is
most up-to-date. Only works when |
checkData |
check if the data (listed as "# output: " in the yml file) exists. If not, do not include in the output csv file. This argument is added for internal testing purpose. |
duplicate |
Whether to remove duplicates. If TRUE, older version of duplicates will be removed. |
Users can directly retrieve information for all available
datasets by using meta_data(dir=)
, which generates a data
frame in R with same information as described above and can be
saved out. dataUpdate
does extra check for all datasets
(check the file path in "output" column), remove invalid ones,
e.g., empty or non-existing file path, and create a data cache
for all valid datasets.
a dataHub
object containing the information about local
data cache, e.g., data name, data path, etc.
## Generate data
## Not run:
library(Rcwl)
outdir <- file.path(tempdir(), "SharedData")
echo_out <- recipeLoad("echo_out")
Rcwl::inputs(echo_out)
echo_out$input <- "Hello World!"
echo_out$outfile <- "outfile"
res <- getData(echo_out,
outdir = outdir,
notes = c("echo", "hello", "world", "txt"),
showLog = TRUE)
ensembl_liftover <- recipeLoad("ensembl_liftover")
Rcwl::inputs(ensembl_liftover)
ensembl_liftover$species <- "human"
ensembl_liftover$from <- "GRCh37"
ensembl_liftover$to <- "GRCh38"
res <- getData(ensembl_liftover,
outdir = outdir,
notes = c("ensembl", "liftover", "human", "GRCh37", "GRCh38"),
showLog = TRUE)
## Update data cache (with or without prebuilt data sets from ReUseData cloud bucket)
dataUpdate(dir = outdir)
dataUpdate(dir = outdir, cloud = TRUE)
## newly generated data are now cached and searchable
dataSearch(c("hello", "world"))
dataSearch(c("ensembl", "liftover")) ## both locally generated data and google cloud data!
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.