Description Usage Arguments Details Value Author(s) References See Also Examples
Reads a file in Stata's XML format into a data frame.
1 2 |
file |
character. a filename. |
convert.dates |
logical. Convert Stata dates to 'Date' or 'POSIXct' class? (see description) |
convert.factors |
logical. Use Stata value labels to create factors? |
convert.underscore |
logical. Convert '"_"' in Stata variable names to '"."' in R names? |
missing.type |
logical. Store information about different types of missing data? |
The Stata XML format was introduced in Stata 9 as an alternative to the binary .dta format to store datasets. This format can be written by the Stata command xmlsave, doctype(dta), and read by the Stata command xmluse. A file in Stata xml format contains the same information contained in the Stata binary .dta files.
The variables in the Stata data set become the columns of the data frame. Missing values are correctly handled. The data label, variable labels, timestamps, and variable characteristics are stored as attributes of the data frame.
By default, Stata dates and times are converted to R's Date
or POSIXct classes using fromStataTime
.
Variables with Stata value labels are converted to factors.
Stata 8.0 introduced a system of 27 different missing data values. If missing.type is TRUE a separate list is created with the same variable names as the loaded data. For string variables the list value is NULL. For other variables the list value is a vector with the length of the number of NA's in that variable. The first element of this vector is the type of missing value corresponding to the first NA in the variable, and so on. The vector is a factor with 27 levels: ".", ".a", ..., ".z". This is attached as the "missing" attribute of the returned value.
A data frame with the following attributes:
versioncharacter. With XML files, always '113'.
time.stampPOSIXct.
datalabelcharacter.
formatscharacter
typescharacter. Stata data types.
val.labelscharacter. value labels associated with each variable.
var.labelscharacter. variable labels.
sortcharacter. Variables by which the dataset was sorted.
charlist. Data and variable characteristics, including notes. See http://www.stata.com/help.cgi?char.
label.tablelist. Stata value labels.
dta_typeAlways equal to "xml". This is used to distinguish data frames returned by read.stataXml from those returned by read.dta.
missinglist with the types of missing values for each variable.
The data frame returned from read.stataXml is slightly different from the data frame returned by read.dta. The major differences are: missing values are returned as factors, and in vectors of length equal to the number of NA's in each variable; variable characteristics and the sort list are returned; and all of Stata's date and time variables are converted to R date/time classes, not just "
Jeffrey Arnold
Stata help for xml. Online at http://www.stata.com/help.cgi?xmlsave. The XML format used by stata has all the components of the binary Stata data format described in http://www.stata.com/help.cgi?dta.
write.stataXml
, fromStataDate
, write.dta
, read.dta
1 2 3 | data(swiss)
write.stataXml(swiss, swissfile <- tempfile())
read.stataXml(swissfile)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.