mxData: Create MxData Object

View source: R/MxData.R

mxDataR Documentation

Create MxData Object

Description

This function creates a new MxData object. This can be used all forms of analysis (including WLS: see mxFitFunctionWLS). It packages observed data (e.g. a dataframe, matrix, or cov or cor matrix) into an object with additional information allowing it to be processed in an mxModel.

Usage

   mxData(observed=NULL, type="none", means = NA, numObs = NA, acov=NA, fullWeight=NA,
          thresholds=NA, ..., observedStats=NA, sort=NA, primaryKey = as.character(NA),
          weight = as.character(NA), frequency = as.character(NA),
          verbose = 0L, .parallel=TRUE, .noExoOptimize=TRUE,
     minVariance=sqrt(.Machine$double.eps), algebra=c(),
   warnNPDacov=TRUE, warnNPDuseWeight=TRUE, exoFree=NULL,
   naAction=c("pass","fail","omit","exclude"),
   fitTolerance=sqrt(as.numeric(mxOption(key="Optimality tolerance"))),
   gradientTolerance=1e-2)

Arguments

observed

A matrix or data.frame which provides data to the MxData object. Can be NULL when summary data are provided via ‘observedStats’.

type

A character string defining the type of data in the ‘observed’ argument. Must be one of “raw”, “cov”, “cor”, or “acov”. If no observed data are provided then use “none”.

means

An optional vector of means for use when ‘type’ is “cov”, or “cor”.

numObs

The number of observations in the data supplied in the ‘observed’ argument. Required unless ‘type’ equals “raw”.

...

Not used. Forces remaining arguments to be specified by name.

observedStats

A list containing observed statistics for weighted least squares estimation. See details for contents

sort

Whether to sort raw data prior to use (default NA).

primaryKey

The column name of the primary key used to uniquely identify rows (default NA)

weight

The column name containing row weights.

frequency

The column name containing row frequencies.

verbose

level of diagnostic output.

.parallel

logical. Whether to compute observed summary statistics in parallel.

.noExoOptimize

logical. Whether to use math short-cuts for the case of no exogenous predictors.

minVariance

numeric. The minimum acceptable variance for ‘observedStats$cov’.

acov

Deprecated in favor of the acov element of observedStats.

fullWeight

Deprecated in favor of the fullWeight element of observedStats.

thresholds

Deprecated in favor of the thresholds element of observedStats.

algebra

character vector. Names of algebras used to fill in calculated columns of raw data. \lifecycleexperimental

warnNPDacov \lifecycle

deprecated

warnNPDuseWeight

logical. Whether to warn when the asymptotic covariance matrix is non-positive definite.

exoFree

logical matrix of observed manifests by exogenous predictors. Defaults to all TRUE, but you can fix some regression coefficients in the observedStats slope matrix to zero by setting entries to FALSE. \lifecycleexperimental

naAction

Specify treatment of missing data. See details. \lifecyclematuring

fitTolerance

fit tolerance used for WLS summary statistics \lifecycleexperimental

gradientTolerance

gradient tolerance used for WLS summary statistics \lifecycleexperimental

Details

The mxData function creates MxData objects used in mxModels. The ‘observed’ argument may take either a data frame or a matrix, which is then described with the ‘type’ argument. Data types describe compatibility and usage with expectation functions in MxModel objects. Three data types are supported (acov is deprecated).

raw

The contents of the ‘observed’ argument are treated as raw data. Missing values are permitted and must be designated as the system missing value. The ‘means’ and ‘numObs’ arguments cannot be specified, as the ‘means’ argument is not relevant and the ‘numObs’ argument is automatically populated with the number of rows in the data. Data of this type may use fit functions such as mxFitFunctionML or mxFitFunctionWLS. mxFitFunctionML will automatically use use full-information maximum likelihood for raw data.

cov

The contents of the ‘observed’ argument are treated as a covariance matrix. The ‘means’ argument is not required, but may be included for estimations involving means. The ‘numObs’ argument is required, which should reflect the number of observations or rows in the data described by the covariance matrix. Cov data typically use the mxFitFunctionML fit function, depending on the specified model.

acov

This type was used for WLS data as created by mxDataWLS. Unless you are using summary data, its use is deprecated. Instead, use type =‘raw’ and an mxFitFunctionWLS. If type ‘acov’ is set, the ‘observed’ argument will (usually) contain raw data and the ‘observedStats’ slot contain a list of observed statistics.

cor

The contents of the ‘observed’ argument are treated as a correlation matrix. The ‘means’ argument is not required, but may be included for estimations involving means. The ‘numObs’ argument is required, which should reflect the number of observations or rows in the data described by the covariance matrix. Models with cor data typically use the mxFitFunctionML fit function.

Note on data handling: OpenMx uses the names of variables to map them onto other elements of your model, such as expectation functions. Thus for data provided as a data.frame, ensure the columns have appropriate names. Covariance and correlation matrices need to have both the row and column names set and these must be identical, for instance by using dimnames = list(varNames, varNames).

Correlation data

To obtain accurate parameter estimates and standard errors, it is necessary to constrain the model implied covariance matrix to have unit variances. This constraint is added automatically if you use an mxModel with type='RAM' or type='LISREL'. Otherwise, you will need to add this constraint yourself.

WLS data

The observedStats contains the following named objects: cov, slope, means, asymCov, useWeight, and thresholds.

‘cov’ The (polychoric) covariance matrix of raw data variables. An error is raised if any variance is smaller minVariance.

‘slope’ The regression coefficients from all exogenous predictors to all observed variables. Required for exogenous predictors.

‘means’ The means of the data variables. Required for estimations involving means.

‘thresholds’ Thresholds of ordinal variables. Required for models including ordinal variables.

‘asymCov’ The asymptotic covariance matrix (all entries non-zero). This matrix is sample size independent. Lavaan's NACOV is comparable to asymCov multiplied by N^2.

‘useWeight’ (optional) The weight matrix used in the mxFitFunctionWLS. Can be dense or diagonal for diagonally weighted least squares. This matrix is scaled by the sample size. Lavaan's WLS.V is comparable to useWeight.

A simple Newton Raphson optimizer is used to obtain the summary statistics from the raw data. There are two parameters that control the accuracy of the optimization. In a first pass, the fit function is optimized to ‘fitTolerance’. However, fit function becomes imprecise as the amount of data increases due to catastrophic cancellation. To fine-tune the fit, the gradient is optimized to ‘gradientTolerance’.

note: WLS data typically use the mxFitFunctionWLS function.

IMPORTANT: The WLS interface is under heavy development to support both very fast backend processing of raw data while continuing to support modeling applications which require direct access to the object in the front end. Some user-interface changes should be expected as we optimize both these workflows.

Missing values

For raw data, the ‘naAction’ option controls the treatment of missing values. When set to ‘pass’, the data is passed as-is. When set to ‘fail’, the presence of any missing value will trigger an error. When set to ‘omit’, missing data will be discarded row-wise. For example, a single missing value in a row will cause the whole row to be discarded. When set to ‘exclude’, rows with missing data are retained but their ‘frequency’ is set to zero.

Weights

In the case of raw data, the optional ‘weight’ argument names a column in the data that contains per-row weights. Similarly, the optional ‘frequency’ argument names a column in the ‘observed’ data that contains per-row frequencies. Frequencies must be integers but weights can be arbitrary real numbers. For data with many repeated response patterns, organizing the data into unique patterns and frequencies can reduce model evaluation time.

In some cases, the fit function can be evaluated more efficiently when data are sorted. When a primary key is provided, sorting is disabled. Otherwise, sort defaults to TRUE.

The mxData function does not currently place restrictions on the size, shape, or symmetry of matrices input into the ‘observed’ argument. While it is possible to specify MxData objects as covariance or correlation matrices that do not have the properties commonly associated with these matrices, failure to correctly specify these matrices will likely lead to problems in model estimation.

note: MxData objects may not be included in mxAlgebras nor in the mxFitFunctionAlgebra function. To reference data in these functions, use a mxMatrix or a definition variable (data.var) label.

Also, while column names are stored in the ‘observed’ slot of MxData objects, these names are not automatically recognized as variable names in mxPaths in RAM models. These models use the ‘manifestVars’ of the mxModel function to explicitly identify used variables used in the model.

Value

Returns a new MxData object.

References

The OpenMx User's guide can be found at https://openmx.ssri.psu.edu/documentation/.

See Also

To generate data, see mxGenerateData; For objects which may be entered as arguments in the ‘observed’ slot, see matrix and data.frame. See MxData for the S4 class created by mxData. For WLS data, see mxDataWLS (deprecated). More information about the OpenMx package may be found here.

Examples


library(OpenMx)

# Simple covariance model. See other mxFitFunctions for examples with different data types

# 1. Create a covariance matrix x and y
covMatrix <- matrix(nrow = 2, ncol = 2, byrow = TRUE,
	c(0.77642931, 0.39590663,
      0.39590663, 0.49115615)
)
covNames <- c("x", "y")
dimList <- list(covNames, covNames)
dimnames(covMatrix) <- dimList

# 2. Create an MxData object from covMatrix
testData <- mxData(observed=covMatrix, type="cov", numObs = 100)

testModel <- mxModel(model="testModel2",
	mxMatrix(name="expCov", type="Symm", nrow=2, ncol=2,
                 values=c(.2,.1,.2), free=TRUE, dimnames=dimList),
    mxExpectationNormal("expCov", dimnames=covNames),
    mxFitFunctionML(),
	testData
)

outModel <- mxRun(testModel)

summary(outModel)


OpenMx documentation built on Oct. 19, 2024, 9:06 a.m.