dectechXmlToDataframe: QuestBack (GlobalPark) XML Output to data frame

View source: R/dectechXmlToDataframe.R

dectechXmlToDataframeR Documentation

QuestBack (GlobalPark) XML Output to data frame

Description

Converts QuestBack (GlobalPark) XML output to an R data frame, converting categorical data to 'factor' varibles.

Usage

dectechXmlToDataframe(filePath, removeIncompletes = TRUE,
                saveLabels = TRUE, dropTimeStamps = TRUE)

Arguments

filePath

location of the XML file

removeIncompletes

(optional) parameter to remove incomplete responeses i.e. anything that isn't 31 or 32. Default is to remove, set to FALSE to keep.

saveLabels

(optional) parameter to save extra variable label details as an extra attribute. Default is to save.

dropTimeStamps

(optional) parameter to drop the "rts" time stamp variables at the end of a data file. Default is to drop.

Details

QuestBack (GlobalPark) offers a number of output formats for survey data:

csv is the simplest, and easiest to understand, but will not store information on the scales used in the survey. A scale is either outputted numerically, so rather than "No children", "1 Child", "2 children", it will be stored as "1", "2", "3", which can be confusing (as in this example "2" = "1 Child". Or there is an option to output the text of labels, but the ordering is not preserved. So "low", "medium", "high" is outputted, but if you make a table it will be in alphabetical order ("high","low","medium").

Outputing in SPSS format, and then converting using the "foreign" package gets around the problems with raw csv. However there are a few issues with this method: certain data types/character strings can cause errors; longer strings will get cut short; and if one level of a variable was never selected that level will not be retained (this becomes an issue if you are merging data from 2 or more surveys, as the levels may no longer match).

XML is an open format that, like SPSS format, retains the data variable labels, but which allows us to avoid the issues introduced by converting via the "foreign" package.

The main drawbacks of the XML output is that the file is larger than the other formats (but can be zipped down to a much smaller file), and the actual layout of the data within the file is unintuitive, hence the need for this function to convert it into a sensible format.

Currently this conversion process can take a while (a minute or two for very large files), but we will look into making it more efficient.

Value

Returns a data frame

Author(s)

Keith Simpson

See Also

See also read.csv, for reading csv files, and the "foreign" package for reading SPSS and Stata data files.

Examples


## Not run: 

# in most cases can just run as:
df = dectechXmlToDataframe("C:/.../data_project_1234_2016_01_01.xml")

# but if you wish to retain incomplete respondents and keep time data you would run
df = dectechXmlToDataframe("C:/.../data_project_1234_2016_01_01.xml",
                        removeIncompletes = FALSE, dropTimeStamps = FALSE)

    
## End(Not run)

Dectech/DectechR documentation built on Jan. 30, 2025, 10:34 a.m.