knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
This vignette shows an example workflow for ensemble biclustering analysis
with the mosbi
package.
Every function of the package has a help page with a detailed documentation.
To access these type help(package=mosbi)
in the R
console.
Import dependencies.
library(mosbi)
Two additional functions are defined, to calculate z-scores of the data and to visualize the biclusters as a histogram.
z_score <- function(x, margin = 2) { z_fun <- function(y) { (y - mean(y, na.rm = TRUE)) / sd(y, na.rm = TRUE) } if (margin == 2) { return(apply(x, margin, z_fun)) } else if (margin == 1) { return(t(apply(x, margin, z_fun))) } } bicluster_histo <- function(biclusters) { cols <- mosbi::colhistogram(biclusters) rows <- mosbi::rowhistogram(biclusters) graphics::par(mfrow = c(1, 2)) hist(cols, main = "Column size ditribution") hist(rows, main = "Row size ditribution") }
Biclustering will be done on a data matrix. As an example,
lipidomics dataset from the metabolights database will be used
https://www.ebi.ac.uk/metabolights/MTBLS562
.
The data consists of 40 samples (columns) and 245 lipids (rows).
# get data data(mouse_data) mouse_data <- mouse_data[c( grep( "metabolite_identification", colnames(mouse_data) ), grep("^X", colnames(mouse_data)) )] # Make data matrix data_matrix <- z_score(log2(as.matrix(mouse_data[2:ncol(mouse_data)])), 1) rownames(data_matrix) <- mouse_data$metabolite_identification stats::heatmap(data_matrix)
The data has a gaussian-like distribution and no missing values, so we can proceed with biclustering.
The mosbi
package is able to work with results of different
biclustering algorithms. The approach unites the results
from different algorithms.
The results of four example algorithms will be computed and
converted to mosbi::bicluster
objects. For a list of all
supported biclustering algorithms/packages type ?mosbi::get_biclusters
in the R
console.
# Fabia fb <- mosbi::run_fabia(data_matrix) # In case the algorithms throws an error, # return an empty list # isa2 BCisa <- mosbi::run_isa(data_matrix) # Plaid BCplaid <- mosbi::run_plaid(data_matrix) # QUBIC BCqubic <- mosbi::run_qubic(data_matrix) # Merge results of all algorithms all_bics <- c(fb, BCisa, BCplaid, BCqubic) bicluster_histo(all_bics)
The histogram visualizes the distribution of bicluster sizes (separately for the number of rows and columns of each bicluster). The total number of found biclusters are given in the title.
The next step of the ensemble approach is the computation of a
similarity network of biclusters. To filter for for similarities due to
random overlaps of biclusters, we apply an error
model (For more details refer to our publication).
Different similarity metrics are available.
For details type mosbi::bicluster_network
in the R
console.
bic_net <- mosbi::bicluster_network(all_bics, # List of biclusters data_matrix, # Data matrix n_randomizations = 5, # Number of randomizations for the # error model MARGIN = "both", # Use datapoints for metric evaluation metric = 4, # Fowlkes–Mallows index # For information about the metrics, # visit the "Similarity metrics # evaluation" vignette n_steps = 1000, # At how many steps should # the cut-of is evaluated plot_edge_dist = TRUE # Plot the evaluation of cut-off estimation )
The two resulting plot visualize the process of cut-off estimation. The right plot show the remaining number of edges for the computed bicluster network (red) and for randomizations of biclusters (black). The vertical red line showed the threshold with the highest signal-to-noise ratio (SNR). All evaluated SNRs are again visualized in the left plot.
The next plot shows the bicluster similarity matrix. It reveals highly similar biclusters.
stats::heatmap(get_adjacency(bic_net))
Before the final step, extraction of bicluster communities (ensemble biclusters), the bicluster network can be layouted as a network.
plot(bic_net)
The networks are plotted using the igraph
package. igraph
specific plotting parameters can be added.
For help type: ?igraph::plot.igraph
To see, which bicluster was generated by which algorithm, the following function can be executed:
mosbi::plot_algo_network(bic_net, all_bics, vertex.label = NA)
The downloaded data contains samples from different weeks of development. This can be visualized on the network, showing from which week the samples within a bicluster come from.
# Prepare groups for plotting weeks <- vapply( strsplit(colnames(data_matrix), "\\."), function(x) { return(x[1]) }, "" ) names(weeks) <- colnames(data_matrix) print(sort(unique(weeks))) # 5 colors required week_cols <- c("yellow", "orange", "red", "green", "brown") # Plot network colored by week mosbi::plot_piechart_bicluster_network(bic_net, all_bics, weeks, week_cols, vertex.label = NA ) graphics::legend("topright", legend = sort(unique(weeks)), fill = week_cols, title = "Week" )
Such a visualization is also possible for the samples:
# Prepare groups for plotting samples <- vapply( strsplit(colnames(data_matrix), "\\."), function(x) { return(x[2]) }, "" ) names(samples) <- colnames(data_matrix) samples_cols <- RColorBrewer::brewer.pal( n = length(sort(unique(samples))), name = "Set3" ) # Plot network colored by week mosbi::plot_piechart_bicluster_network(bic_net, all_bics, samples, samples_cols, vertex.label = NA ) graphics::legend("topright", legend = sort(unique(samples)), fill = samples_cols, title = "Sample" )
Calculate the communities
coms <- mosbi::get_louvain_communities(bic_net, min_size = 3, bics = all_bics ) # Only communities with a minimum size of 3 biclusters are saved.
# Plot all communities for (i in seq(1, length(coms))) { tmp_bics <- mosbi::select_biclusters_from_bicluster_network( coms[[i]], all_bics ) mosbi::plot_piechart_bicluster_network(coms[[i]], tmp_bics, weeks, week_cols, main = paste0("Community ", i) ) graphics::legend("topright", legend = sort(unique(weeks)), fill = week_cols, title = "Week" ) cat("\nCommunity ", i, " conists of results from the following algorithms:\n") cat(get_bic_net_algorithms(coms[[i]])) cat("\n") }
Finally, communities of the network can be extracted as ensemble biclusters.
The are saved as a list of mosbi::bicluster
objects and therefore in the
same format as the imported results of all the algorithms.
With the parameters row_threshold
& col_threshold
,
the minimum occurrence of a row- or column-element in the biclusters
of a community can be defined.
ensemble_bicluster_list <- mosbi::ensemble_biclusters(coms, all_bics, data_matrix, row_threshold = .1, col_threshold = .1 )
sessionInfo()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.