knitr::opts_chunk$set(echo = TRUE, message = FALSE, warning = FALSE)

Purpose

This vignette demonstrates how to integrate more than one factor into the linear models. Here, we show how to model the data with two factors plus the interaction thereof. The underlying dataset is generated in a course that is held on a yearly base. The context is that yeast is, in one condition, grown on glucose, and in the other condition, yeast is grown on glycerol and ethanol. Here we are looking into the results of two different batches, where different people performed the wet-lab work, and even different LC-MS instruments were involved. It is, therefore, essential to model the batch variable for these two similar datasets. We are also modeling the interaction between the two explanatory variables batch and _condition_for demonstration purposes. In this case, having a significant interaction term would mean the protein is expressed more in the Glucose condition in one batch. In contrast, the same protein is more abundant in the Ethanol condition in the other batch.

An in depth introduction to modelling and testing interactions can be found here.

Model Fitting

For more details how the dataset data_Yeast2Factor was created we refer you to the prolfquabenchmark vignettes. Interesting here is the definition of the model. If interaction shall be included in the model a asterix should be used while if no interaction should be taken into account a plus should be used in the model definition. Also we can directly specify what comparisons we are interested in by specifying the respective contrasts.

conflicted::conflict_prefer("filter", "dplyr")

data_Yeast2Factor <- prolfqua::prolfqua_data("data_Yeast2Factor")
data_Yeast2Factor <- prolfqua::LFQData$new(data_Yeast2Factor$data, prolfqua::old2new(data_Yeast2Factor$config))

pMerged <- data_Yeast2Factor
pMerged$data$Run_ID <- as.numeric(pMerged$data$Run_ID)
pMerged$factors()


formula_Batches <-
  prolfqua::strategy_lm("transformedIntensity ~ condition_ * batch_ ")

# specify model definition
Contrasts <- c("Glucose - Ethanol" = "condition_Glucose - condition_Ethanol",
               "p2370 - p2691" = "batch_p2370 - batch_p2691",
               "Glucose_vs_Ethanol_gv_p2370" = "`condition_Glucose:batch_p2370` - `condition_Ethanol:batch_p2370`",
               "Glucose_vs_Ethanol_gv_p2691" = "`condition_Glucose:batch_p2691` - `condition_Ethanol:batch_p2691`",
               "Interaction" = "`Glucose_vs_Ethanol_gv_p2370` - `Glucose_vs_Ethanol_gv_p2691`")

We are then building our model as we specified it before for each protein.

mod <- prolfqua::build_model(pMerged$data, formula_Batches,
  subject_Id = pMerged$config$table$hierarchy_keys() )

Now, we can visualize the effect of our factors that are specified in the model. This indicates in an elegant way what factors are the most important ones.

mod$anova_histogram()$plot

ANOVA

To examine proteins with a significant interaction between the two factors treatment and batch filtering for the factor condition_:batch_.

ANOVA <- mod$get_anova()
ANOVA |> dplyr::filter(factor == "condition_:batch_") |> dplyr::arrange(FDR) |> head(5)
protIntSig <- ANOVA |> dplyr::filter(factor == "condition_:batch_") |>
  dplyr::filter(FDR < 0.05)
protInt <-  pMerged$get_copy()
protInt$data <- protInt$data[protInt$data$protein_Id %in% protIntSig$protein_Id,]

These proteins can easily be visualized using the boxplot function from the plotter objects in prolfqua

ggpubr::ggarrange(plotlist = protInt$get_Plotter()$boxplots()$boxplot)

Compute and analyse with the specified contrasts

Next, we want to calculate the statistical results for our group comparisons that have been specified in our contrast definition. Here we are using the moderated statistics which implements the concept of pooled variance for all proteins.

contr <- prolfqua::ContrastsModerated$new(prolfqua::Contrasts$new(mod, Contrasts))
contrdf <- contr$get_contrasts()

These results can be visualized with e.g a volcano or a MA plot.

plotter <- contr$get_Plotter()
plotter$volcano()$FDR
plotter$ma_plot()

Analyse contrasts with missing data imputation

Still using the approach above, we can only estimate group averages in case there is at least one measurement for each protein in each group/condition. In the case of missing data for one condition, we can use the ContrastsMissing function where the 10th percentile expression of all proteins is used for the estimate of the missing condition.

contrSimple <- prolfqua::ContrastsMissing$new(pMerged, Contrasts)
contrdfSimple <- contrSimple$get_contrasts()
pl <- contrSimple$get_Plotter()
pl$histogram_diff()
pl$volcano()$FDR

Merge nonimputed and imputed data.

For the moderated model, we can only get results if we have enough valid data points. With the group average model we can get group estimates for all proteins. Therefore, we are merging the statistics for both approaches while we are preferring the values of the moderated model. Also these results can again be visualized in a volcano plot.

dim(contr$get_contrasts())
dim(contrSimple$get_contrasts())

mergedContrasts <- prolfqua::merge_contrasts_results(prefer = contr, add = contrSimple)$merged
cM <- mergedContrasts$get_Plotter()
plot <- cM$volcano()
plot$FDR

Look at Proteins with significant interaction term.

sigInteraction <- mergedContrasts$contrast_result |> 
  dplyr::filter(contrast == "Interaction" & FDR < 0.2)

protInt <-  pMerged$get_copy()
protInt$data <- protInt$data[protInt$data$protein_Id %in% sigInteraction$protein_Id,]

protInt$get_Plotter()$raster()

hm <- protInt$get_Plotter()$heatmap()
hm

The prolfqua package is described in [@Wolski2022.06.07.494524].

Testing interaction computation

url <- "https://raw.githubusercontent.com/genomicsclass/dagdata/master/inst/extdata/spider_wolff_gorb_2013.csv"
filename <- "spider_wolff_gorb_2013.csv"
downloader::download(url, filename)
spider <- read.csv(filename, skip = 1)
XA <- lm(friction ~ type * leg, data = spider)
summary(XA)

head(spider)
spider <- spider |> tidyr::unite(legtype , leg, type, remove = FALSE)
XF <- lm(friction ~ legtype - 1, data = spider)
library(multcomp)
summary(glht(XF, linfct = rbind(i1 = c(1, -1, -1, 1 ,0, 0, 0, 0))))
summary(glht(XF, linfct = rbind(i1 = c(1, -1, 0, 0, -1, 1, 0, 0 ))))
summary(glht(XF, linfct = rbind(i1 = c(1, -1, 0, 0,  0, 0, -1, 1  ))))

Session Info

sessionInfo()

References



wolski/prolfqua documentation built on May 12, 2024, 10:16 p.m.