knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
library(devtools)
install_github('biocore/bayestime')
library(BayesTime)

This vignette introduces the BayesTime package by a workflow with real data example. This helps those who are unfamiliar with BayesTime package and want to use its functions to analyze longitudinal data. ***

Overview

(Whate BayesTime does)***

We take ECAM data as an example to go through each function in BayesTime. Before analysis, we recommend to first define the names of unique subject id, time vairable, response variable in dataset.

data("ECAM")
head(ECAM)
colnames(ECAM)
unique_subject_id <- 'studyid'
time <- 'month_of_life'
response <- 'shannon'

Workflow

plot_qqplot(): Check if data is normally distributed

Since model fitting requires a normally distributed variable, we first need to check if the response variable needs any transformation. BayesTime::plot_qqplot() takes a dataframe and a column name of response variable to generate qqPlot of the original, log transformation, and square root of the response variable.

#tiff('~/Dropbox/lab/SFPCA/qqplot.tiff')
plot_qqplot(ECAM, 'shannon')
#dev.off()

We can see that the original response vairable is more normally distributed than other transformation. So we keep the data as it is, otherwise the variable needs to be transformed.

plot_group(): Generate spaghetti plots by group

BayesTime::plot_group() generates the spaghetti plot of one interested variable by group. This takes the logitudinal data frame, the column name of time variable, column name of response variable, column name of unique subject id and one column name from interested variables.

d <- plot_group(data = ECAM, time_name = time, response_name = response,
           unique_subject_id = unique_subject_id, variable_name = 'delivery')$figure

prepare_data(): Preprocess data before modeling

BayesTime::prepare_data() transforms a dataframe object to a list for stan modeling. This dataframe needs the column names of unique subject id for each subject, time variable, and a column of response vairable. There are three options to set the response variable: standardize, center or NULL, default is standardize. This function also can set whether to scale time variable, default is FALSE.

dat <- prepare_data(data = ECAM, unique_subject_id = unique_subject_id,
                    time_name = time, response_name = response, 
                    transform_y = 'standardize', scale_time = T, average = T)

summary(dat)

stan_fit(): Get invariant variables in data

BayesTime::stan_fit() is the core function to fit longitudinal data. This takes the list of data from prepare_data(), number of samples and chains in stan modelling, and the number of PC and knot. stan_fit() will loop through each pc and knot number and return a list of stan models with respective pc and knot. **(more explanation) This process may take a lot of time even if the pc and knot number are small. We recommend to save the stan modeling results to save time.

stan_results <- stan_fit(sfpca_data = dat, Nsamples = 1000, Nchain = 3,
                        Ncores = 2, PC_range = c(1:3), nknot_range = c(1:3))
# save(stan_results, file='~/Dropbox/lab/SFPCA/results_pkg/stan_ECAM.RData')
# load('~/Dropbox/lab/SFPCA/results_pkg/stan_ECAM.RData')

optimal(): Get the optimal model

BayesTime::optimal() takes a list of fitted stan models and use loo_compare() from loo package. This prints out the compare results and return the optimal stan model.

optimal_model_idx <- optimal(model_list = stan_results)
optimal_model <- stan_results[[optimal_model_idx]]
#optimal_model <- stan_results[[4]]

plot_k_diagnostic(): Plot diagnostic figure

BayesTime::plot_k_diagnostic() takes data list from prepare_data() and the optimal stan model. This helps to evaluate the fitting result.***

plot_k_diagnostic(sfpca_data = dat, model = optimal_model)$figure

plot_posterior_diagnostic(): Plot the figure of overlay densities

BayesTime::plot_density_overlay() takes data list from prepare_data() and the optimal stan model. This helps to evaluate the fitting result.***

plot_posterior_diagnostic(sfpca_data = dat, model = optimal_model)$figure

basis_setup_sparse(): ***

# model_basis <- basis_setup_sparse(dat, nknots = optimal_model$knot, orth=TRUE)
# summary(model_basis)

post_hoc_rotation(): ***

# model_rotation <- post_hoc_rotation(dat, model = optimal_model)
# summary(model_rotation)

output_results(): ***

model_output <- output_results(sfpca_data = dat, model = optimal_model)
summary(model_output)

plot_residual(): Residual plot of the model

BayesTime::plot_residual() takes the output from output_results() and generates the residual plot of the fit model.

#plot_residual(model_output)

plot_mean_curve(): ***

#tiff('~/Dropbox/lab/SFPCA/meancurve_plot.tiff')
p <- plot_mean_curve(output = model_output, original = T, x_lab = 'age', y_lab = 'shannon')$figure
#dev.off()
p <- plot_mean_curve(output = model_output, original = F, x_lab = 'age', y_lab = 'shannon')$figure

plot_fpc_curve():

d <- plot_fpc_curve(output = model_output, pc_idx = c(1,2,3), original = T, x_tick = c(2,5,9))$figure
d <- plot_fpc_curve(output = model_output, pc_idx = c(2,3), original = F)$figure

plot_fpc_on_mean_curve():

d <- plot_fpc_on_mean_curve(output = model_output, pc_idx = 1, original = T, sd = F)$figure

d <- plot_fpc_on_mean_curve(output = model_output, pc_idx = 2, original = F, sd = T)$figure

plot_fpc_group_mean():

d <- plot_fpc_group_mean(output = model_output, pc_idx = 1, original = T, group_name = 'delivery')$figure
d <- plot_fpc_group_mean(output = model_output, pc_idx = 1, original = F, group_name = 'diet_3')$figure

plot_fpc_boxplot(): Generate boxplot for fpc score by group

BayesTime::plot_fpc_boxplot() generates the boxplot of fpc scores from sfpca model and compare them by group. This function takes the output list from output_results() and the vector of variables that are invariant with time.

boxplot_data <- plot_fpc_boxplot(output = model_output, pc_idx = 1, group_name = 'diet_3',
                 testing_type = 'parametric', pairwise_testing = T, x_lab = 'group',
                 p_adjust_meth = 'holm', pval_show_all = T)$figure
boxplot_data <- plot_fpc_boxplot(output = model_output, pc_idx = 1, group_name = 'diet_3',
                 testing_type = 'non-parametric', pairwise_testing = T, x_lab = 'group',
                 p_adjust_meth = 'none', pval_show_all = T)$figure
boxplot_data <- plot_fpc_boxplot(output = model_output, pc_idx = 1,
                                 group_name = 'diet_3', testing_type = 'parametric',
                                 pairwise_testing = F, global_testing = F,
                                 group_order = c('eb', 'fd', 'bd'), p_title = 'Boxplot',
                                 p_adjust_meth = 'BH')$figure
boxplot_data <- plot_fpc_boxplot(output = model_output, pc_idx = 2, group_name = 'sex',
                 testing_type = 'non-parametric', pairwise_testing = T, x_lab = 'sex',
                 p_adjust_meth = 'none', pval_show_all = T)$figure

invariants(): Get invariant variables in data

BayesTime::invariants() helps to obtian a set of variables that is not changed through time for each subject and excludes the variables that are same for all subjects. This takes a dataframe, column names of unique subject id , time, and a vector of interested variable names. The default instersted variable names parameter is set to NULL, then the function will look over all vairables in data.

invars <- invariants(dat$data, unique_subject_id, time)
invars

effectSize_rda(): RDA analysis by fpc scores

BayesTime::effectSize_rda() automates the rda analysis using rda() from vegan package. This takes the output list from output_results(), stan model from optimal(), and a set of variables self-input or from invariants(). It returns the table and ggplot of the rda analysis results.

model_rda <- effectSize_rda(output = model_output, model = optimal_model, 
                            variables = invars, trace=F)$figure
print(model_rda)


biocore/bayestime documentation built on Nov. 15, 2020, 5:40 p.m.