cor_gerbil: Correlation Analysis for 'gerbil' Objects
In gerbil: Generalized Efficient Regression-Based Imputation with Latent Processes

cor_gerbil

R Documentation

Correlation Analysis for `gerbil` Objects

Description

This function assesses the bivariate properties of imputed data using a correlation analysis. Specifically, it calculates pairwise correlations for observed cases and for imputed cases. The function also calculates the Fisher z-transformation for each correlation and performs a hypothesis test using the transformed correlations in order to compare correlations calculated using imputed cases to those calculated using observed cases.

Usage

cor_gerbil(x, y = NULL, imp = 1, log = NULL, partial = "imputed")

Arguments

`x`	A `gerbil` object containing the imputed data.
`y`	A vector listing the column names of the imputed data that will be included in the correlation analysis. By default, `y` contains all columns of the data that required imputation. If `TRUE`, all variables with missing values eligible for imputation are used.
`imp`	A scalar indicating which of the multiply imputed datasets should be used for the analysis. Defaults to `imp = 1`.
`log`	A character vector that includes names variable of which a log transformation is to be taken prior to calculating correlations.
`partial`	Indicates how partially imputed pairs are handled when calculating correlations. If `partial = 'imputed'`, cases with at least one missing variable in a pair are considered imputed. Otherwise (`partial = 'observed'`), only cases with both variables in the pair missing are considered imputed.

Details

Cases are assigned a status of being observed or imputed in a pairwise fashion. That is, a specific data unit may be considered observed when calculating a correlation for one pair of variables and be imputed when calculating a correlation for another pair. For a given pair of variables, cases that have both variables observed are always treated as observed, and cases that have both variables missing are always treated as imputed. Cases that have only one variable in the pair observed (i.e., those that are partially imputed) are treated as imputed when the input partial = 'imputed' (the default) and are otherwise treated as observed.

Correlations are calculated across an expanded dataset that creates binary indicators for categorical variables and for semicontinuous variables. Unlike the algorithm used to calculate the imputations, missingness is not artificially imposed in any binary indicator. Missingness is imposed, however, in the variable corresponding to the continuous portion of a semicontinuous variable.

Note that the hypothesis test based upon the Fisher z-transformation is based off of bivariate normal assumptions. As such, p-values may be misleading in data where this assumption does not hold.

Value

cor_gerbil() retuns an object of the class cor_gerbil that has following slots:

Correlations: A list containing two elements – these are named Observed, Imputed, and All. The first is a matrix giving the sample correlations when calculated across cases labeled as observed. The second and third are analogous correlation matrices calculated across only cases labeled as imputed and across all cases, respectively.
n: A list containing two elements – these are named Observed, Imputed, and All. The first is a matrix giving number of cases in the respective pair of variables that have been labeled as observed. The second and third are analogous matrices indicating the number of cases labeled as imputed for each pair and indicating the total number of cases for each pair, respectively.
Fisher.Z: A list containing two elements – these are named Observed, Imputed, and All. These matrices give the Fisher z-transformation of the correlations in the matrices provided in the slot Correlations.
Statistic: A matrix that gives the value of the test statistic based on the Fisher z-transformation for each pair of variables. This statistic may be used to assess whether the correlations calculated across cases labeled as observed are statistically different from the correlations calculated across cases labeled as imputed.
p.value: A matrix that list the p-value for each test statistic provided in the matrix in the slot labeled Statistic.

Examples


#Load the India Human Development Survey-II dataset
data(ihd_mcar) 

imps.gerbil <- gerbil(ihd_mcar, m = 1, mcmciter = 100, ords = "education_level", 
       semi = "farm_labour_days", bincat = c("sex", "marital_status", "job_field", "own_livestock"))

#Run the correlation analysis
cors.gerbil <- cor_gerbil(imps.gerbil, imp = 1)

#Print a summary
cors.gerbil

gerbil documentation built on Jan. 12, 2023, 5:10 p.m.