rosetta
allows an analyst to combine datasets that measure the same latent traits when there is only partial overlap of measurements across the constituent datasets.
Consider the case where we have three independent datasets which have measurements on three latent factors. In total, we have three variables per latent factor, however, each dataset only measures two out of three per latent factor.
library(rosetta) d_sim <- sim(seed = 100) d_missing <- d_sim$missing d_complete <- d_sim$complete
The simulated 'complete' data would look like
lapply(d_complete, head)
while the simulated 'missing' data (representative of our real life use case) looks like
lapply(d_missing, head)
'rosetta' can now be run so that the latent factors contained within the three independent datasets are summarized into a single dataset of factor scores. This allows simplicity, statistical power, and modeling flexibility of a single joint analysis of the information contained within the original data.
d_rosetta <- rosetta( d = d_missing, factor_structure = list( a = c("a_1", "a_2", "a_3"), b = c("b_1", "b_2", "b_3"), c = c("c_1", "c_2", "c_3") ) ) # combine rosetta results into a single dataset d_rosetta <- as.data.frame(do.call("rbind", d_rosetta)) # check the factor score output head(d_rosetta)
We will compare the factor scores from the complete data versus the missing data.
library(dplyr) library(tidyr) library(ggplot2) # get factor scores from complete data ## bind the complete data d_complete <- do.call("rbind", d_complete) ## create RAM model factor_structure <- attributes(d_sim)$factor_structure sem_model <- rosetta:::sem_model(factor_structure) ## observed covariance matrix cov_mat <- rosetta:::obs_cov(d_complete) ## complete sem sem_fit <- sem::sem( model = sem_model, S = cov_mat, N = ncol(cov_mat) ) ## model results complete_fscores <- as.data.frame(sem::fscores(model = sem_fit, data = d_complete)) # Visualize comparison ## combine data d_rosetta <- tidyr::gather(d_rosetta, key = "key", value = "rosetta", a, b, c) complete_fscores <- tidyr::gather(complete_fscores, key = "key", value = "complete", a, b, c) d_plot <- cbind(d_rosetta, complete_fscores["complete"]) ## plot ggplot(d_plot, aes(x = complete, y = rosetta, color = key)) + geom_point() + labs( x = "Complete data factor scores", y = "Rosetta factor scores", color = "Factor" ) ## correlation table table <- d_plot %>% dplyr::group_by(key) %>% dplyr::summarize(cor = cor(rosetta, complete, method = "pearson")) knitr::kable(table, caption = "correlation table")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.