library(knitr) ### See https://bookdown.org/yihui/rmarkdown-cookbook/hook-truncate.html # save the built-in output hook hook_output <- knitr::knit_hooks$get("output") # set a new output hook to truncate text output knitr::knit_hooks$set(output = function(x, options) { n <- options$out.lines if (!is.null(n)) { x <- xfun::split_lines(x) if (length(x) > n) { # truncate the output x <- c(head(x, n), "....\n") } x <- paste(x, collapse = "\n") } hook_output(x, options) }) ### knitr::opts_chunk$set( echo = TRUE, eval = TRUE, collapse = TRUE, message = TRUE, comment = "#>", # tidy.opts = list(width.cutoff=60), # tidy = TRUE, linewidth = 60, out.lines = 4 ) library(dmdScheme) library(here) # library(kableExtra) ## This is from here https://groups.google.com/forum/#!topic/knitr/WnxJfORwCTk # hook_output = knitr::knit_hooks$get('output') # knitr::knit_hooks$set(output = function(x, options) { # # this hook is used only when the linewidth option is not NULL # if (!is.null(n <- options$linewidth)) { # x = knitr:::split_lines(x) # # any lines wider than n should be wrapped # if (any(nchar(x) > n)) x = strwrap(x, width = n) # x = paste(x, collapse = '\n') # } # hook_output(x, options) # })
dmdScheme
The functionality for using the dmdScheme
is available either via the online app (which can also be run on an in-house shiny server), a local running app, or via the R command line. The simplest method is using the online app, as no additional software needs to be installed locally. To be able to use the dmdScheme
functionality locally, either via the app or via the R command line, it is necessary to install R and the dmdScheme
package.
In the following section, I will go through the different stages of using the dmdScheme
via the web app, a local app or the R prompt. The detailed commands which have to be used can be found in Figure 1.
knitr::include_graphics("./figs/workflow_enterMetaData_app_online.png", auto_pdf = TRUE) knitr::include_graphics("./figs/workflow_enterMetaData_app_local.png", auto_pdf = TRUE) knitr::include_graphics("./figs/workflow_enterMetaData_R.png", auto_pdf = TRUE)
The app (Figure 2) can be accessed either via the internet as a web app at https://rmkrug.shinyapps.io/dmd_app/, or locally. To run it locally, you need R and the dmdScheme
package installed. After loading the dmdScheme
package, you can start the app locally by running run_app()
at the R prompt. After these steps, the usage of the two apps is identical.
When using the dmdScheme
from the command line, the initial setup is the same as running the app locally, only that it is not necessary to start the app.
knitr::include_graphics("./figs/app.jpg", auto_pdf = TRUE)
The package dmdScheme
does not come with a specific scheme, and installs upon loading a generic dmdScheme from the dmdScheme scheme repository at https://github.com/Exp-Micro-Ecol-Hub/dmdSchemeRepository. In nearly all circumstances, a specific scheme needs to be installed, together with the accompanying R package. In the app, this is done via selecting a theme in the section "Available dmdSchemes". This list is populated automatically upon starting of the app from the schemes available in the scheme repository. The selection of the scheme will download the scheme definition package, install any accompanying R package as specified in the scheme definition package, load the accompanying R package, and activate the scheme definition.
To do this from the R prompt, one has do these steps manually and install the scheme, install the accompanying R package, load the accompanying R package, and activate the scheme itself (see Figure 1 C for the commands).
The spreadsheet to enter the metadata can be obtained from the app via the "Empty scheme spreadsheet" button. This will download an .xlsx
spreadsheet containing the definition of the scheme and the cells for the metadata which need to be filled in.
In R, the spreadsheet can be obtained by using the open_new_spreadsheet()
command.
Figure 3 shows two screenshots of the spreadsheet as opened in Excel.
knitr::include_graphics("./figs/Spreadsheet_01_MdBibliometric.jpg", auto_pdf = TRUE) knitr::include_graphics("./figs/Spreadsheet_02_MdAuthor.jpg", auto_pdf = TRUE) # knitr::include_graphics("./figs/Spreadsheet_03_Experiment.jpg", auto_pdf = TRUE) knitr::include_graphics("./figs/Spreadsheet_04_Genus.jpg", auto_pdf = TRUE) # knitr::include_graphics("./figs/Spreadsheet_08_DataFileMetadata.jpg", auto_pdf = TRUE)
The entering of the metadata is done in the spreadsheet downloaded before. The spreadsheet contains the definition of the scheme as well as all information needed to fill in the metadata, including "type" of the metadata (e.g. numeric, character, or boolean), "allowed values" or "suggested values", "unit" as well as a "Description" field.
Metadata can be entered in all green cells, while all red cells are (should be) locked and are part of the scheme definition.
Here the familiarity of researchers with spreadsheets plays an important role, as the technical details of the entering itself do not need to be explained as it is a standard spreadsheet.
The metadata in the .xlsx
file can be entered and edited in likely all spreadsheet programs which can read and write .xlsx
files. We tested successfully Excel, Pages, Libre Office and Google Docs.
A spreadsheet as frontend for entering the metadata offers the additional possibilities to specify validation rules within the spreadsheet itself to be checked during the entering process and not only afterwards. As we wanted to maintain the flexibility to edit the spreadsheet in different spreadsheet editors in a consistent way, this is not implemented in the dmdScheme
or emeScheme
, but could be easily done in other schemes.
To be able to validate the metadata, the validation needs access to the metadata as well as (in most cases) the data files themselves, as the validation can include checking for complete definition of the column names in tables, as implemented in the @Krug2019b. Consequentially, the metadata as well as the data needs to be uploaded to the app, which is done via the "Upload spreadsheet containing metadata" and the "Upload datafiles" buttons. In the case of the web app, these are uploaded to the server running the app (in this case https://www.shinyapps.io, which is run by RStudio Inc.).
If there are e.g. confidentiality or size reasons why an upload to a server is not an option, one could host an in-house shiny server, which would run the shiny server and the app. The other option would be to use the local app, in which case the metadata and data remain on the local computer. Nevertheless, the data is copied into a different directory. If for example size constraints would prohibit that approach, one can use the R prompt. In this case, the validation function does not need to do any copying of the data, and only needs the path to the directory in which the data resides.
After completion of the validation, the app (web or local) downloads a report, in the default setting an html report. Optional, the report could be a word docx
document or a pdf
.
The report (see Figure 4) does show errors (which normally should be fixed), warnings, which are not as critical as errors and should be assessed one by one, and notes. The level of details, and aspects validated, in this report depends on the validation function, which can be changed in the accompanying R package.
The editing - validation cycle should be repeated until the report is satisfactory.
On the R prompt, the validation involve the two commands validate()
and report()
to validate the spreadsheet and to create the report from the object resulting from the validation.
knitr::include_graphics("./figs/ValidationReport.jpg", auto_pdf = TRUE)
In the app, the xml can be obtained by using the "Export to xml" button. Depending on the export functionality implemented for the selected scheme (as defined in the accompanying R package), the file returned is an .xml
file (dmdScheme
) or a compressed archive (.tar.gz
) containing multiple .xml
files (e.g. for the emeScheme
, one per data file). On the R prompt, the command write_xml()
will export to a single (dmdScheme
) or multiple .xml
files (emeScheme
) and return the names(s) of the file(s) exported.
The package dmdScheme
is a base package for the usage and development of domain specific metadata schemes. It provides functionality to enter the metadata, validate the entered metadata, and export it to xml format for further processing by e.g. archival repositories. This document will give an overview over the dmdScheme
package and what it contains.
The recommended way is to install from the R-Universe which always contains the last stable version:
# Enable universe by uzh-peg options(repos = c( uzhpeg = 'https://uzh-peg.r-universe.dev', CRAN = 'https://cloud.r-project.org')) # Install dmdScheme install.packages('dmdScheme')
To install the master branch, the stable branch which will become the new CRAN release, from the dmdScheme repository on github, run
## install the remotes package if not installed yet if (require("remotes")) { install.packages("remotes") } devtools::install_github("UZH-PEG/dmdScheme", ref = "master", build_opts = NULL)
If you are feeling adventurous, want to live at the bleeding edge and can live with possibly non-working features, you can install the dev branch. This branch is not stable and features and functionality can appear or be removed without prior notice. This is, unless there is a really good reason, not recommended for production use:
## install the remotes package if not installed yet if (require("remotes")) { install.packages("remotes") } devtools::install_github("UZH-PEG/dmdScheme", ref = "dev", build_opts = NULL)
Other branches are not generally recommended for installation unless you are involved in dmdScheme
package development.
When you load the package, the definition of the scheme is downloaded from the dmdScheme repository installed to a temporary scheme library in a temporary directory for usage in this R session. As this scheme library is stored in a temporary directory, it will be deleted when you quit your R session and it will be re-downloaded each time you start a new session and load the package. To create a permanent package library you have to create a cache in the user directory. To do this, run
cache(createPermanent = TRUE)
and restart your R session. Now the definitions of the installed dmdSchemes
will be installed in this user cache and be available permanently. For further info, see the documentation of the command cache()
.
Once the package is loaded, the default scheme definition from the package is installed and used.
There are several commands to manage installed schemes. All these commands start with scheme_
.
These are in detail:
scheme_default()
returns the default scheme and version, i.e. the one with which the dmdScheme package is based. This can be the dmdScheme
package itself, or a package created with make_new_package()
.
scheme_default()
scheme_repo()
Get or set scheme repository. If repo
is specified, the scheme repository to be used is set. Otherwise, the scheme repository used is only returned.
scheme_repo()
scheme_list_in_repo()
Show all schemes and version available in a repo. The defaut repo is . The function reads and simply returns the file SCHEME_DEFINITIONS.yaml
in the folder schemes
in the repository.
scheme_list_in_repo()
scheme_download()
Download a scheme definition from the repo set and stores in the destfile. The function returns the fully qualified file name to the downloaded file invisibly.
scheme_download(name = "emeScheme", version = "0.9.5", destfile = tempfile())
scheme_install()
Install a new scheme definition. In the normal usecase, this function uses scheme_download()
to download the scheme definition from the default github repository and installs it. The usage is
scheme_install(name = "emeScheme", version = "0.9.5")
scheme_install_r_package()
Installs the accompanying R package of the same name as the scheme. This does only install the package - it still needs to be loaded to be used!
scheme_install_r_package(name = "emeScheme", version = "0.9.5")
scheme_list()
Shows the installed schemes.
scheme_list()
scheme_use()
Activate the scheme NAME
with version VERSION
scheme_use(name = "emeScheme", version = "0.9.5")
scheme_active()
Shows the currently active scheme.
scheme_active() scheme_use(name = "dmdScheme", version = "0.9.9") scheme_active()
scheme_path_index_template()
Returns the path to the index template. The search order is:
dmdScheme
package (not the acompanying package!).scheme_path_index_template()
scheme_path_xlsx()
Returns the path to the xlsx
file included in the scheme package.
scheme_path_xlsx()
scheme_path_xml()
Returns the path to the xml
file included in the scheme package.
scheme_path_xml()
scheme_uninstall()
Uninstall an installed scheme. The scheme definition, is deleted from the scheme library, and moved to a temporary flder which wil be deleted at the end of the R session.
scheme_uninstall(name = "emeScheme", version = "0.9.5")
scheme_installed()
Return TRUE
, if the scheme is installed
scheme_installed(name = "emeScheme", version = "0.9.5")
scheme_make()
This function is used to package a new scheme.
To enter new data to the dmdScheme
, you have to run the command
open_new_spreadsheet()
This will open Excel and the file should look similar to this, when looking at the second tab as in Figure 3.
The following points are important to remember:
1) The file is saved in a temporary directory. It needs to be saved at a different location, if you want to keep the changes. 2) Data can only be entered in the green cells with. All other cells are write protected.
After entering the data, save it to a location for further processing.
Next, you have to import the data entered in the Excel sheet into R. For simplicity, we use here a file included in the package. If you want to load your own file, replace scheme_path_xlsx()
with the file name and path to that file.
x <- read_excel( file = scheme_path_xlsx(), verbose = TRUE )
The verbose = TRUE
argument will produce messages which will show you what is happening and will help to identify problems.
dmdScheme
Data can be printed by using the print()
function. The function has three arguments which control the printout:
printAttr
: if TRUE
(default) print the basic attributes prefixed with A
printExAttr
: if TRUE
print the all attributes prefixed with X
printData
: if TRUE
(default) print the data prefixed with D
print( x, printAttr = FALSE, printExtAttr = FALSE, printData = FALSE )
print( x, printAttr = TRUE, printExtAttr = FALSE, printData = FALSE )
print( x, printAttr = TRUE, printExtAttr = TRUE, printData = FALSE )
print( x, printAttr = TRUE, printExtAttr = TRUE, printData = TRUE )
The metadata in the spreadsheet can be validated by using the following command and which results in an object of class dmdScheme_validation
.
validate( scheme_path_xlsx() )
To create a report (html, pdf or docx) yo can use the report()
function:
report( scheme_path_xlsx() )
which will open a html report in your browser.
You can export the data to an an xml_document
object as defined in the xml2
package to do further processing:
x <- read_excel( scheme_path_xlsx() ) xml <- as_xml( x, file = xmlFile) xml
You can save the xml to a file by using
xmlFile <- tempfile(fileext = ".xml") xml2::write_xml(x = xml, file = xmlFile)
or directly in one command
xmlFile <- tempfile(fileext = ".xml") write_xml( x = x, file = xmlFile )
x <- read_xml( xmlFile ) x
This package contains all the functionality to easily create a new metadata scheme. The resulting scheme package will contain all files needed so that it can be easily be distributed and used, and can be either be distributed directly or uploaded to an online repository, e.g. the default dmdScheme repository.
A new scheme definition only has to include the scheme definition and example data. It can also contain additional examples and an R script to install the accompanying R package, but these are not required.
To create the scheme itself (as shown in Figure 5), it is recommended to begin with an existing scheme. It is possible to edit most aspects in the scheme definition xlsx
file, including adding or deleting tabs (only the Experiment and DataFileProperties tabs are needed), adding or removing rows ("vertical" tabs like e.g. the Experiment tab) or columns (horizontal (other) tabs), changing the types, suggested values, allowed values and units. If the .xlsx
file is locked, you can unlock it by using the password test
.
It is important that the final scheme contains example data as
- internal processing of the .xlsx
file require example data and will fail if a scheme without example data is provided, and
- it is much more user friendly if the user of a package not only sees the empty scheme, but also a filled in scheme with example data.
This .xlsx
file can be edited following the limitations as specified below. After modifications have been done and the spreadsheet has been saved, the scheme package can be created by using
scheme_make( "TheFile.xlsx" )
which will create a file named NAME_VERSION.tar.gz
in the directory path
based on the scheme definition in TheFile.xlsx
. NAME
and VERSION
are the scheme name and scheme version as specified in the cell (H:1)
in sheet Experiment
.
The following two commands will install the new scheme and use it:
scheme_install(file = "NAME_VERSION.tar.gz") scheme_use("NAME", "VERSION")
To upload the scheme to the main repository on github, please either clone the repo and send a pull request or file an issue with the scheme package as an attachment.
dmdScheme
There are a few minimum requirements for dmdScheme
derived metadata schemes, so that all functions in the dmdScheme
package will work.
These are:
Tab name has to be identical to propertySet
value (Cell A:2
in the tab Experiment, B:1
in other tabs)
The tab named Experiment
is required.
This tab is different to the others as it has
The tabs MdBibliometric
and MdAuthors
are required as they contain all the bibliometric and author metadata
The tab named DataFileMetaData
is required
dataFileName
Each tab except Experiment
, DataFileMetaData
and MdBibliometric
needs an ID field
Applies to all tabs except Experiment
and DataFileMetaData
...ID
.The spreadsheet can contain a tab DOCUMENTATION
. This tab can contain information for the user and will not be imported.
knitr::include_graphics("./figs/workflow_newScheme.png", auto_pdf = TRUE)
The package dmdScheme
package is only providing the base functionality for working with dmdSchemes. In many cases, the functionality needs to be extended to be able to work with other schemes. This can be done by creating an accompanying R package. It is a standard R package which has to
dmdScheme
package (as it extends it's capabilities), andIt is not necessary to build an accompanying R package, but doing so will make it possible to adapt many aspects of the validation and export process to the needs of a specific domain.
The easiest method to develop an accompanying R package is to
- use scheme_use()
to activate a new scheme for which the accompanying package should be developed
- use the convenience function make_new_package()
to create a package skeleton which fulfils these requirements
Now, functions can be added. For an example see https://github.com/Exp-Micro-Ecol-Hub/emeScheme, in which the methods as_xml_list.emeSchemeSet.R
and validate.emeSchemeSet_raw.R
extend the generic functions as_xml_list()
and validate()
.
The aim of an index file is, to giva a short and concise overview over the data package deposited. It should contain certain metadata points, e.g. the DOI, authors, license, description, file name(s) of data file(s) and whatever else is considered necessary, to get a basic understanding of the data set.
The index file is created based on a text template, in which tokens are stand for metadata values from the a dmdScheme
. These are replaced with the actual metadata properties when calling the function make_index( x = A_dmdScheme, template = "the/index/template/file.txt)
. If asked for, the resulting index file is translated to pdf and html by using pandoc.
Tokens are enclosed by%%...%%
and must not contain any white spaces (tab, space, ...).
The following tokens are allowed at the moment:
dmdScheme
%%propertySet%%
or %%propertySet.*%%
or %%propertySet.*.*%%
: replace with the propertySet as a tablepropertySet.valueProperty
or propertySet.valueProperty.*
: replace with the valueProperty (column) from a propertySet as a comma separated list%%propertySet.valueProperty.N%%
: replace with the Nth valueProperty from a propertySet%%propertySet.*.N%%
*: replace with the Nth row from a propertySetIf the token starts with an %%!fun!xxx%%
, the R function fun
is executed and the result is inserted in the index file. For example %%!unique!DataFileMetaData.dataFileName%%
would only return a vector containing the unique dataFileName
.
Special tokens are tokens which do not extract data from the metadata scheme. There are at the moment only two:
%%DATE%%
: will be replaced with the current date%%AUTHOR%%
: will be replaced with the author as supplied to the functionAny scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.