In llrs/inteRmodel: Find Models of Interaction

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  cache = FALSE
)

Introduction

The goal of inteRmodel is to provide the tools required to search for models in the generalized canonical correlations as provided by the RGCCA package.

You can read more about canonical correlation at [@tenenhaus2011].

library(inteRmodel)

In this use case we want to find the best model to relate the gene expression on a glioma with some repetitions at the DNA level, taking into account that these tumors are at three different locations. Our idea is that the location might influence the tumor.

Data format

We use the data used on the vignette of the RGCCA package. We need the samples in the rows and the variables in the columns:

data("ge_cgh_locIGR")
A <- ge_cgh_locIGR$multiblocks
Loc <- factor(ge_cgh_locIGR$y)
levels(Loc) <- colnames(ge_cgh_locIGR$multiblocks$y)
A[[3]] = A[[3]][, -3]
names(A)[3] <- "Loc"

With these steps we read and prepare the data for the sgcca and rgcca function.

Workflow

The workflow is as follows:

Selecting the interaction-model
Evaluation of the models
Tuning the weight in the different interaction models
Comparison of the different models after bootstrapping

These steps are explained in detail in the next sections.

Null hypothesis

Then we need to encode our hypothesis in a model, which takes the shape of a matrix. To ease the design of this model there are several functions available. The most important one is subSymm, which makes substitutions on symmetric matrices.

C <- matrix(0, nrow = 3, ncol = 3, dimnames = list(names(A), names(A)))
model0 <- subSymm(C, "GE", "CGH", 1)
model0 <- subSymm(model0, "GE", "Loc", 1)
model0

Here we hypothesize that the Agriculture and the industry are linked and that the agriculture is linked with the political block.

Search models

Now we look for the model that best relates these data.

out_model <- search_model(A = A, c1 = c(.071, .2, 1), scheme = "horst",
               scale = FALSE, verbose = FALSE,
               ncomp = rep(1, length(A)),
               bias = TRUE)

This quickly explores over the 20 different models that could be the best ones. We can see that our hypothesis is in the upper middle:

hist(out_model$AVE_inner, xlab = "inner AVE", 
     main = "Histogram of AVE_inner in the models", breaks = 20)
abline(v = out_model$AVE_inner[out_model$var12 == 0 & 
                                 out_model$var13 == 1 &  out_model$var23 == 1])

According to this, a better model would then be:

columns <- grep("var", colnames(out_model))
model <- symm(C, out_model[which.max(out_model$AVE_inner), columns])
model

A model that instead of the expected relation between Agriculture and Industry block, they are both related to the Political block.

Refinement of the model

We further explore the relationships in this model:

out_best <- iterate_model(C = model, A = A, c1 = c(.071,.2, 1), 
                          scheme = "horst",
                          scale = FALSE, verbose = FALSE,
                          ncomp = rep(1, length(A)),
                          bias = TRUE)

We can see that usually the model is around 0.4 inner AVE.

hist(out_best$AVE_inner, xlab = "inner AVE", 
     main = "Histogram of AVE_inner in the models")

The best model would then be:

model2 <- symm(C, out_best[which.max(out_best$AVE_inner), columns])
model2

Evaluate stability

A next step would be to see if some models are better than the others, something along these lines but with more iterations.

index <- boot_index(nrow(A[[1]]), 10)
bs0 <- boot_index_sgcca(index, A = A, C = model0, c1 = c(.071,.2, 1))
bs1 <- boot_index_sgcca(index, A = A, C = model, c1 = c(.071,.2, 1))
bs2 <- boot_index_sgcca(index, A = A, C = model2, c1 = c(.071,.2, 1))

Stability

We can see that they have different distributions:

plot(c(0.6, 0.9), c(0.05, 0.15), type = "n", xlab = "inner AVE", ylab = "outer AVE",
     main = "Bootstrap of models")
points(bs0$AVE, pch = 16, col = "red")
points(bs1$AVE, pch = 16, col = "green")
points(bs2$AVE, pch = 16)
legend("topright", legend = c("model2", "model", "model0"), 
       fill = c("black", "green", "red"))

df <- rbind(
  cbind.data.frame(bs0$AVE, Model = "model 0"),
  cbind.data.frame(bs1$AVE, Model = "model 1"),
  cbind.data.frame(bs2$AVE, Model = "model 2"))
agg <- aggregate(df[, 1:2], list(Model = df$Model), mean)
points(agg[, 2:3], pch = 17, col = c("red", "green", "black"), cex = 1.5)

This suggests that the model2 has a higher inner AVE than the model0. With more iterations we could see if there is a tendency of a model to be more centered than the others. So, indicating that the model can better find the relationship between the blocks regardless of the input data.

Features

We can now analyze the canonical correlation for the weight of each variable.

m2 <- RGCCA::sgcca(A, model2, c1 = c(.071,.2, 1), scheme = "horst",
               scale = TRUE, verbose = FALSE, ncomp = rep(1, length(A)),
               bias = TRUE)
m0 <- RGCCA::sgcca(A, model0, c1 = c(.071,.2, 1), scheme = "horst",
               scale = TRUE, verbose = FALSE, ncomp = rep(1, length(A)),
               bias = TRUE)

We can now explore the position of each sample:

pars <- par(mfrow = c(1, 2))
plot(m0$Y[[1]][, 1], m0$Y[[2]][, 1], type = "n", xlab = "Block1 Comp1", 
     ylab = "Block2 Comp1", main = "Initial model")
abline(h = 0, v = 0)
text(m0$Y[[1]][, 1], m0$Y[[2]][, 1], labels = Loc, col = as.numeric(Loc) + 1)

plot(m2$Y[[1]][, 1], m2$Y[[2]][, 1], type = "n", xlab = "Block1 Comp1", 
     ylab = "Block2 Comp1", main = "Final model")
abline(h = 0, v = 0)
text(m2$Y[[1]][, 1], m2$Y[[2]][, 1], labels = Loc, col = as.numeric(Loc) + 1)

Although there is a relative difference in the general outcome, what has changed more is the importance of each variable. As we can see here:

par(pars)
plot(m0$a$CGH[, 1], m2$a$CGH[, 1], main = "CGH variables importance", 
     xlab = "model0", ylab = "model2", pch = 16)

Where some variables that had a weight of 0 in model 0 become important and some variables that were important that no longer have any weight on the model 2.

Conclusion

Finding a different model that explains your data better challenges the assumptions about the data. By comparing models, we can learn which blocks of variables are connected thus showing new relationships. These new relationships consider the same variables but with different importance, which, with the right model, should be more accurate.

Session Info

sessionInfo()

Bibliography

llrs/inteRmodel documentation built on April 1, 2022, 4:04 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

llrs/inteRmodel
Find Models of Interaction

In llrs/inteRmodel: Find Models of Interaction

Introduction

Data format

Workflow

Null hypothesis

Search models

Refinement of the model

Evaluate stability

Stability

Features

Conclusion

Session Info

Bibliography

R Package Documentation

Browse R Packages

We want your feedback!

llrs/inteRmodel Find Models of Interaction

In llrs/inteRmodel: Find Models of Interaction

Introduction

Data format

Workflow

Null hypothesis

Search models

Refinement of the model

Evaluate stability

Stability

Features

Conclusion

Session Info

Bibliography

R Package Documentation

Browse R Packages

We want your feedback!

llrs/inteRmodel
Find Models of Interaction