knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

This vignette illustrates an example on how to use the sup.r.jive package to run Joint and Variation Explained (JIVE) predict. JIVE.predict was developed by Kaplan and Lock (2017) and is a two-step, multi-source supervised method that can identify joint and individual components among the data sources, and use those components to build a prediction model. The joint and individual components are calculated by JIVE, which was proposed by Lock et al. (2013).

Here, we avoid methodological details and focus on the functionality of the sup.r.jive package.

Loading the data

If you haven't used this package before, you'll have to install it first. You can install the development version of sup.r.jive from GitHub with:

# install.packages("devtools")
#devtools::install_github("enorthrop/sup.r.jive")
library(sup.r.jive)

For this example, we will use both the SimData.norm and SimData.bin data files contained in this package. After loading the data, we can see that SimData.norm contains 2 data matrices saved as a list (X) and contains a continuous outcome vector (Y). Note that both data matrices have 40 rows or predictors, and have 30 columns or observations. The number of rows need not match across sources, but each source and the outcome must be collected on the same group of individuals.

data("SimData.norm")
data("SimData.bin")
str(SimData.norm)

Running JIVE.predict with a Continuous Outcome

Let's say that we want to uncover the joint structure between the data sources and each source's individual structure. Using those structures, we also want to predict the outcome Y. We can calculate this by running the JIVE.pred function. By default, the ranks of the joint and individual components will be calculated by the permutation approach, and a linear prediction model will be constructed. When finding the joint and individual components, JIVE requires a centered and scaled dataset. The centering and scaling can be done internally in the function (center=T and scale=T) if selected.

fit <- JIVE.pred(X=SimData.norm$X, Y=SimData.norm$Y, rankJ=1, rankA = c(1,1),
                 family="gaussian")

The output shows the final ranks for the joint and individual components as well as the number of iterations it took to reach convergence. By printing the model output, we can find similar information.

fit

Or if we want a more detailed summary, we can do the following:

summary(fit)

If we want to make predictions, we can do so using the predict function. Note that the Y predictions are for the centered and scaled outcome. fit$data contains the centered and scaled data, which should be used when assessing prediction accuracy of the method.

fit.pred <- predict(fit, newdata = SimData.norm$X)
#MSE
sum((fit$data$Y-fit.pred$Ypred)^2)/length(fit$data$Y)

Visualization and interpretation

The sup.r.jive package contains 3 visualization tools that help display the results. Each of the 3 functions will work for an object of class JIVE.pred, sJIVE, or sesJIVE as its input, but in this vignette, we will briefly discuss each function using the fitted JIVE.predict model from above.

plotHeatmap()

This function displays a heatmap of the contribution of the joint and individual components. Note that there is not a heatmap for the residual error of the fitted model.

plotHeatmap(fit) 

The order_by option in this function allows the user to order the column by the original data (default), the joint component (order_by=1), or the i'th individual component (order_by=i).

plotHeatmap(fit, order_by = 0, ylab="Outcome", xlab=c("Data1", "Data2"))

plotVarExplained()

This function creates a series of barplots to display the percent of variance explained by the joint and individual components. The col option allows you to choose the color palette in the graphs. The values graphed in the barplots can be found by the \$variance output from the summary() function.

plotVarExplained(fit, col=c("grey20", "grey43", "grey65"))

plotFittedValues()

This function displays two diagnostic plots for the fitted $Y$ values. The first plot compares the residuals to the fitted values, and the second plot is a Q-Q plot to look at the quantiles.

plotFittedValues(fit)

Running JIVE.predict with a Binary Outcome

Now we are going to use the SimData.bin data file. Similar to the other data fiel, there are 2 data matrices saved as a list (X) and an outcome vector (Y). However, Y is a binary outcome in case. The number of rows need not match across sources, but each source and the outcome must be collected on the same group of individuals.

str(SimData.bin)

Let's say that we want to uncover the joint structure between the data sources and each source's individual structure. Using those structures, we also want to predict the outcome Y. We can calculate this by running the JIVE.pred function. By default, the ranks of the joint and individual components will be calculated by the permutation approach, and a linear prediction model will be constructed. Since our outcome is binary, we will set the family option to binomial. JIVE requires a centered and scaled dataset. The centering and scaling can be done internally in the function (center=T and scale=T) if selected.

fit <- JIVE.pred(X=SimData.bin$X, Y=SimData.bin$Y, rankJ=1, rankA = c(1,1),
                 family="binomial")

The output shows the final ranks for the joint and individual components as well as the number of iterations it took to reach convergence. By printing the model output, we can find similar information.

fit

Or if we want a more detailed summary, we can do the following:

summary(fit)

Notice that calculating the sum of squares does not make sense when Y is binary, so the variance explained table will not include values for Y, and an analysis of deviance table is displayed instead of an ANOVA summary table.

If we want to make predictions, we can do so using the predict function. Make sure the new X dataset also contains centered and scaled values for accurate predictions.

fit.pred <- predict(fit, newdata = SimData.bin$X, center=F, scale=F)
#MSE
sum((fit$data$Y-fit.pred$Ypred)^2)/length(fit$data$Y)

Visualization and interpretation

The sup.r.jive package contains 3 visualization tools that help display the results. Each of the 3 functions will work for an object of class JIVE.pred, sJIVE, or sesJIVE as its input, but in this vignette, we will briefly discuss each function using the fitted JIVE.predict model from above. Some of the visualizations will look different for a binary outcome than they did for a continuous outcome.

plotHeatmap()

This function displays a heatmap of the contribution of the joint and individual components. Note that there is not a heatmap for the residual error of the fitted model.

plotHeatmap(fit) 

The order_by option in this function allows the user to order the column by the original data (default), the joint component (order_by=1), or the i'th individual component (order_by=i).

plotHeatmap(fit, order_by = 0, ylab="Outcome", xlab=c("Data1", "Data2"))

plotVarExplained()

This function creates a series of barplots to display the percent of variance explained by the joint and individual components. The col option allows you to choose the color palette in the graphs. The values graphed in the barplots can be found by the \$variance output from the summary() function. Since Y is not continuous, calculating the variation explained for Y does not make sense and is not calculated.

plotVarExplained(fit, col=c("grey20", "grey43", "grey65"))

plotFittedValues()

This function will create density plots to show the separation (or lack thereof) between the predictor value and the binary outcome. Note that this is a different plot than when we had a continuous outcome.

plotFittedValues(fit)

References

Kaplan, A, and EF Lock. 2017. "Prediction With Dimension Reduction of Multiple Molecular Data Sources for Patient Survival." \textit{Cancer Informatics} 16:1-11.

Lock, EF, KA Hoadley, JS Marron, and AB Nobel. 2013. “Joint and Individual Variation Explained (JIVE) for Integrated Analysis of Multiple Data Types.” \textit{The Annals of Applied Statistics} 7 (1): 523–42.

Sandri, BJ, A Kaplan, SW Hodgson, M Peterson, S Avdulov, L Higgins, T Markowski, P Yang, AH Limper, TJ Griffin, P Bitterman, EF Lock, and CH Wendt. 2018. "Multi-omic molecular profiling of lung cancer in COPD." \textit{The European Respiratory Journal} 52(1).

Yihong Z, A Klein, FX Castellanos, and MP Milham. 2019."Brain age prediction: Cortical and subcortical shape covariation in the developing human brain." \textit{NeuroImage} 202:116149.



enorthrop/sup.r.jive documentation built on Nov. 18, 2022, 6:01 p.m.