In danieladamspencer/bayestensorreg: Bayesian Tensor Regression

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

Bayesian Tensor Response Regression with a Gaussian Graphical Model

This is a vignette to cover the basic usage of the model developed by Spencer et al. (2020), with the goal of making the implementation of the model accessible to as many people as possible. The basic structure of the input data is a list containing the following :

Y is a list of G different tensor responses, each with the same dimensions. The second-to-last index should correspond to time, and the last index should correspond to subject.
x is a matrix where the first index corresponds to time and the second index corresponds to subject.

Simulated data in this format can be created using the TRR_GGM_simulated_data function.

library(bayestensorreg)
set.seed(95064) # This is the zip code for the University of California, Santa Cruz
input <- TRR_GGM_simulated_data(
  subjects = 20, 
  regions = 5, # Corresponds to G
  max_time = 100,
  margin_sizes = c(30, 30), # In this example, D = 2
  SNR = 1,
  CNR = 1,
  conn_regions = 2,
  conn_level = 0.95
)
str(input)

Note that the simulated data have the class TRR_GGM_data and a few extra elements (true_d,true_B, and true_d_covar) that allow for the verification of the model results. If using your own data, it is not necessary to give the input list the class TRR_GGM_data, as the BTRR_GGM function will check your data and assign the class if the data are in the correct format. Next, the BTRR_GGM function on the input will (by default) run the rank 1 tensor decomposition model for 100 iterations, which should take about 5 minutes.

btrr_ggm_result <- BTRR_GGM(input)
##### 2021-07-22 16:21:01 - Rank = 1 Iteration # 10 of 100 #####
##### Time elapsed: 38.855  seconds #####
##### Estimated time remaining: 349.695 seconds #####
##### 2021-07-22 16:21:31 - Rank = 1 Iteration # 20 of 100 #####
##### Time elapsed: 68.805  seconds #####
##### Estimated time remaining: 275.22 seconds #####
##### 2021-07-22 16:22:01 - Rank = 1 Iteration # 30 of 100 #####
##### Time elapsed: 98.963  seconds #####
##### Estimated time remaining: 230.913666666667 seconds #####
##### 2021-07-22 16:22:31 - Rank = 1 Iteration # 40 of 100 #####
##### Time elapsed: 128.29  seconds #####
##### Estimated time remaining: 192.435 seconds #####
##### 2021-07-22 16:23:01 - Rank = 1 Iteration # 50 of 100 #####
##### Time elapsed: 158.635  seconds #####
##### Estimated time remaining: 158.635 seconds #####
##### 2021-07-22 16:23:30 - Rank = 1 Iteration # 60 of 100 #####
##### Time elapsed: 187.61  seconds #####
##### Estimated time remaining: 125.073333333333 seconds #####
##### 2021-07-22 16:24:01 - Rank = 1 Iteration # 70 of 100 #####
##### Time elapsed: 218.62  seconds #####
##### Estimated time remaining: 93.6942857142857 seconds #####
##### 2021-07-22 16:24:34 - Rank = 1 Iteration # 80 of 100 #####
##### Time elapsed: 251.87  seconds #####
##### Estimated time remaining: 62.9675 seconds #####
##### 2021-07-22 16:25:04 - Rank = 1 Iteration # 90 of 100 #####
##### Time elapsed: 281.734  seconds #####
##### Estimated time remaining: 31.3037777777778 seconds #####
##### 2021-07-22 16:25:37 - Rank = 1 Iteration # 100 of 100 #####
##### Time elapsed: 314.342  seconds #####
##### Estimated time remaining: 0 seconds #####

data(btrr_ggm_result, package = "bayestensorreg")

Next, extracting the tensor samples from the posterior distribution can be done using the BTRR_GGM_extract_B function. The mean and the sequential 2-means estimates for the tensor coefficients are found and plotted below to compare to the true values.

all_B <- BTRR_GGM_extract_B(btrr_ggm_result)
mean_B <- sapply(all_B, function(b) apply(b,1:2,mean), simplify = F)
names(mean_B) <- paste("Region", 1:5)
mean_df <- melt_list(mean_B)
mean_df$Quantity <- "Posterior Mean"
s2m_B <- sapply(all_B, s2m_B, simplify = F)
names(s2m_B) <- paste("Region",1:5)
s2m_df <- melt_list(s2m_B)
s2m_df$Quantity <- "Seq. 2-means"
names(input$true_B) <- paste("Region", 1:5)
true_df <- melt_list(input$true_B)
true_df$Quantity <- "True Value"
compare_df <- rbind(mean_df,s2m_df,true_df)
library(ggplot2)
ggplot(compare_df) +
  geom_raster(aes(x = Var1, y = Var2, fill = value)) + 
  facet_grid(L1~Quantity) + 
  labs(x="",y="", title = "Coefficient Estimates - Rank 1 Model") +
  scale_fill_gradient("")

The connectivity results, shown through partial correlations can be found using the median posterior precision below. Though the sequential 2-means estimate could also be used to estimate the partial correlations, that is suggested for longer MCMC chains, as 100 samples from the posterior distribution is not enough for robust estimation of the precision.

Sig_median <- apply(btrr_ggm_result$Sig,1:2,median)
median_parcor <- DensParcorr::prec2part(Sig_median)
median_df <- melt_array(median_parcor)
median_df <- median_df[median_df$Var1 > median_df$Var2,]
median_df$Quantity <- "Posterior Median"
true_parcor <- DensParcorr::prec2part(solve(input$true_d_covar))
true_parcor <- melt_array(true_parcor)
true_parcor <- true_parcor[true_parcor$Var1 > true_parcor$Var2,]
true_parcor$Quantity <- "True Value"
parcor_df <- rbind(median_df,true_parcor)
ggplot(parcor_df) +
  geom_point(aes(x = value, y = paste(Var1, Var2, sep = "&"), color = Quantity)) +
  labs(x = "Partial Correlation", y = "Region Pair", title = "Connectivity")

These are the basic quantities of interest in the model, but all parameters are saved. Additional functionality is planned for plotting and model comparison metrics.

danieladamspencer/bayestensorreg documentation built on July 23, 2024, 10:14 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

Tweet to @rdrrHQ

GitHub issue tracker

ian@mutexlabs.com