read.stataXml: Read Stata XML files

Description Usage Arguments Details Value Author(s) References See Also Examples

Description

Reads a file in Stata's XML format into a data frame.

Usage

1
2
read.stataXml(file, convert.dates = TRUE, convert.factors = TRUE,
convert.underscore = FALSE, missing.type = TRUE)

Arguments

file

character. a filename.

convert.dates

logical. Convert Stata dates to 'Date' or 'POSIXct' class? (see description)

convert.factors

logical. Use Stata value labels to create factors?

convert.underscore

logical. Convert '"_"' in Stata variable names to '"."' in R names?

missing.type

logical. Store information about different types of missing data?

Details

The Stata XML format was introduced in Stata 9 as an alternative to the binary .dta format to store datasets. This format can be written by the Stata command xmlsave, doctype(dta), and read by the Stata command xmluse. A file in Stata xml format contains the same information contained in the Stata binary .dta files.

The variables in the Stata data set become the columns of the data frame. Missing values are correctly handled. The data label, variable labels, timestamps, and variable characteristics are stored as attributes of the data frame.

By default, Stata dates and times are converted to R's Date or POSIXct classes using fromStataTime. Variables with Stata value labels are converted to factors.

Stata 8.0 introduced a system of 27 different missing data values. If missing.type is TRUE a separate list is created with the same variable names as the loaded data. For string variables the list value is NULL. For other variables the list value is a vector with the length of the number of NA's in that variable. The first element of this vector is the type of missing value corresponding to the first NA in the variable, and so on. The vector is a factor with 27 levels: ".", ".a", ..., ".z". This is attached as the "missing" attribute of the returned value.

Value

A data frame with the following attributes:

The data frame returned from read.stataXml is slightly different from the data frame returned by read.dta. The major differences are: missing values are returned as factors, and in vectors of length equal to the number of NA's in each variable; variable characteristics and the sort list are returned; and all of Stata's date and time variables are converted to R date/time classes, not just "

Author(s)

Jeffrey Arnold

References

Stata help for xml. Online at http://www.stata.com/help.cgi?xmlsave. The XML format used by stata has all the components of the binary Stata data format described in http://www.stata.com/help.cgi?dta.

See Also

write.stataXml, fromStataDate, write.dta, read.dta

Examples

1
2
3
data(swiss)
write.stataXml(swiss, swissfile <- tempfile())
read.stataXml(swissfile)

jrnold/stataXml documentation built on May 20, 2019, 2:06 a.m.