writeBioticXML: Write biotic and acoustic XML files from data frames given a...

Description Usage Arguments Examples

Description

writeBioticXML Writes a data frame to a biotic XML file.

writeAcousticXML Writes a data frame to an acoustic XML file.

readHIXSD Function for reading an xsd (xml schema file) and returning the structure of the corresponding xml. The output is designed to be used to store the xsd info as colnames of a data frame for use in the funciton frame2nestedList(), but may be useful for reading an arbitraty xsd file.

validateHIXSD Function for validating the input x against an xsd.

writeXMLusingXSD Utility function of data.frame2xml for writing any data frame to an XML file given an XML root and an xsd.

data.frame2nestedList Utility function of data.frame2xml for converting a data frame to a nested list. The data frame must have column names with prefixes such as Level4.Var or Level2.Attr or Level3.AttrReq followed by "." and the variable name (e.g., Level3.AttrReq.serialno).

list2XML Utility function for converting a list to an xml object. This function is a generalization of the funciton as_xml_document() in the package xml2, which turns a list into anxml object, but not for too deep lists.

data.frame2xml Utility function of writeXMLusingXSD for converting a data frame to an xml object.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
writeBioticXML(x, file, xsd = "1.4", blocksize = 100,
  addVersion = TRUE, na.rm = TRUE,
  declaration = "<?xml version=\"1.0\" encoding=\"UTF-8\"?>",
  strict = TRUE, discardSimple = FALSE, maxlines = 10, cores = 1)

writeAcousticXML(x, file, xsd = "1", blocksize = 100,
  addVersion = TRUE, na.rm = TRUE,
  declaration = "<?xml version=\"1.0\" encoding=\"UTF-8\"?>",
  strict = TRUE, discardSimple = FALSE, maxlines = 10, cores = 1)

writeXMLusingXSD(x, file, root, blockvar = NULL, blocksize = 100,
  addVersion = TRUE, xsd = NULL, na.rm = TRUE,
  declaration = "<?xml version=\"1.0\" encoding=\"UTF-8\"?>",
  strict = TRUE, discardSimple = FALSE, maxlines = 10, cores = 1)

data.frame2xml(x, root, xsd, na.rm = FALSE, strict = TRUE,
  discardSimple = FALSE, msg = FALSE)

getHIXSDfile(xsd = "1.4", xsdtype = c("biotic", "acoustic"))

readHIXSD(xsd = "1.4", xsdtype = c("biotic", "acoustic"),
  discardSimple = FALSE)

validateHIXSD(x, xsd, strict = TRUE, discardSimple = FALSE)

Arguments

x

The data frame to write to an XML file, validated against the xsd. The data frame has one column per combination of variable and attribute, where the attributes are coded into the column names in the following manner: variableName..attributeName.attributeValue. If there are variables with identical names at different levels in the XMl hierarchy, the level (i.e., the name of the parent node) can be given in the column name by separation of a dot: variableName.level.

file

The path to the XML file to write.

xsd

The path to an xsd (xml schema) file, or the output from readHIXSD, or the version of XSDs used in StoX (attached to the Rstox package).

blocksize, blockvar

The variable and number of list elements by which to divide the XML into blocks which are written to separate files and then merged at the end.

addVersion

Logical: If TRUE, add the version interpreted from the xsd.

na.rm

Logical: If TRUE, remove missing values in the XML file (otherwise save these values as NA in the file).

declaration

The declaration string heading the XML file.

strict

Logical: If TRUE remove columns with names that are not recognized in the xsd.

discardSimple

Logical: If TRUE, discard simplecontent from the xsd.

maxlines

The number of lines to read from the individual XML files written (in blocks) by writeXMLusingXSD, to determine which lines to merge between the files.

cores

The number of cores to use to parallel writing of the individual XML files, which are then merged to one file. Set this > 1 to speed up the writing.

root

The root of the XML to write. Use in writeXMLusingXSD, which requires a root to append the XML to.

msg

Logical: If TRUE, print messages to the console.

xsdtype

The type of XSD, currently one of "biotic" and "acoustic", used when reading an XSD (as used by the Institute of Marine Research).

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
## Not run: 
# Read a biotic xsd in the Rstox package:
ver <- "1.4"
filename <- paste0("biotic_v", ver, ".xsd")
xsd <- system.file("extdata", "xsd", "biotic", filename, package="Rstox")
xsd <- readHIXSD(xsd)
head(xsd$x)

# Get the biotic data from the test project:
g <- getBaseline("Test_Rstox", input=FALSE, endProcess="FilterBiotic", proc="FilterBiotic")

# The variable 'weight' has different meaning for g$FilterBiotic_BioticData_CatchSample.txt 
# and g$FilterBiotic_BioticData_Individual.txt, so we rename for the merging below to work:
library(data.table)
setnames(g$FilterBiotic_BioticData_CatchSample.txt, "weight", "weight.catchsample")
setnames(g$FilterBiotic_BioticData_Individual.txt, "weight", "weight.individual")

# Merge the biotic data to one data frame, but exclude the comment column first:
by12 <- c("cruise", "serialno", "platform")
by23 <- c(by12, "SpecCat", "species", "noname", "aphia", "samplenumber")
d <- merge(merge(g[[1]], g[[2]], all=TRUE, by=by12), g[[3]], all=TRUE, by=by23)
# Revert the names of the weights back to the original 
# for comparison with a new project using the new biotic.xml file:
setnames(g$FilterBiotic_BioticData_CatchSample.txt, "weight.catchsample", "weight")
setnames(g$FilterBiotic_BioticData_Individual.txt, "weight.individual", "weight")

# Lengths are given in cm in StoX and in m in the xml-files:
d$length <- d$length / 100
# Weights are given in g in StoX and in kg in the xml-files:
d$weight.individual <- d$weight.individual / 1000

# The variables 'startdate' and 'stopdate' are given at 'fishstation' level in StoX, 
# and the corresponding values at 'mission' level are ignored:
setnames(d, "startdate", "startdate.fishstation")
setnames(d, "stopdate", "stopdate.fishstation")

# The variable 'startlog' is stored as 'logstart' in StoX:
names(d)[names(d) == "logstart"] <- "startlog"

# The variables 'producttype', 'sampleproducttype' and 'specimensamplecount' are stored as 
# 'measurement', 'samplemeasurement' and 'individualsamplecount', respectively, in StoX:
setnames(d, "measurement", "producttype")
setnames(d, "samplemeasurement", "sampleproducttype")
setnames(d, "individualsamplecount", "specimensamplecount")

# Validate the data against the xsd:
v <- validateHIXSD(d, xsd)
# The column names in StoX differ somewhat from the xsd:
d$specimenno <- d$no
d$year <- format(as.Date(d $startdate, "%d/%m/%Y"), "%Y")
d$missionnumber <- d$cruise
v <- validateHIXSD(d, xsd)

# Try to write the biotic data to an XML file:
tempXML <- file.path(tempdir(), "biotic.xml")
# Writing XML files takes time (one minute):
system.time(writeBioticXML(d, tempXML))

# Recreate the Test_Rstox project using the new biotic.xml file:
createProject("Test_Rstox", ow=TRUE, 
    ReadBioticXML=list(FileName1=tempXML, FileName2="", FileName3="", FileName4="", FileName5=""))
# Reopen and save the project to format the project.xml file properly:
reopenProject("Test_Rstox")
saveProject("Test_Rstox")

# Get the biotic data from the project with the new file:
g2 <- getBaseline("Test_Rstox", input=FALSE, endProcess="FilterBiotic", proc="FilterBiotic")

# There might be some minor differences due to precision:
all.equal(g, g2)

# Finally reset to the original tesp project:
createProject("Test_Rstox", ow=TRUE)

## End(Not run)

Sea2Data/Rstox documentation built on May 14, 2019, 8:58 a.m.