```{=html} <!--


add to description URL: https://usethis.r-lib.org, https://github.com/r-lib/usethis BugReports: https://github.com/r-lib/usethis/issues -->

```r
library(BiocStyle)
library(knitr)
knitr::opts_chunk$set(error = FALSE, message = FALSE, warning = FALSE, cache = FALSE)
# knitr::opts_chunk$set(fig.asp = 1)

Workflow version information

R version: r R.version.string

Bioconductor version: r BiocManager::version()

Package: r packageVersion("txGeneNetwork")

Introduction

```{=html} <!--

This workflow was motivated by ...

Functional analysis is widely used to ...

-->

# Getting started

Install and run all needed packages

```r
#txGeneNetwork:::load_dependencies()
# suppressPackageStartupMessages({
#   library(ggraph)
#   library(tidygraph)
#   library(ggforce)
#   library(readr)
#   library(tidyr)
#   library(ggnewscale)
#   library(concaveman)
# })


requireNamespace("ggplot2")
requireNamespace("ggforce")
requireNamespace("dplyr")
requireNamespace("readr")
requireNamespace("tidyr")
requireNamespace("ggnewscale")
requireNamespace("tidygraph")
requireNamespace("ggraph")
requireNamespace("concaveman")
library(ggplot2)
library(dplyr)
library(ggforce)
library(readr)
library(tidyr)
library(ggnewscale)
library(tidygraph)
library(ggraph)
library(concaveman)

library(txGeneNetwork)

Input data

```{=plain}

## Quick Start

Your input data must be organized as a *.csv* with **From** and **To** Columns.
For this specific graph your from-to order would go Process -\> Gene and in another line Gene -\> Transcript

+------------+---------------+
| From       | To            |
+============+===============+
| Process\_1 | Gene\_1       |
+------------+---------------+
| Gene\_1    | Transcript\_1 |
+------------+---------------+
| Gene\_1    | Transcript\_2 |
+------------+---------------+

Other metadata columns that correspond to the **edges** info can be added and imported.
For this type of network, we add a Process column, or to which biological process that edge belongs to and a Direction column, to show if that transcript is up or down-regulated in our example data.

+------------+---------------+------------+------------+
| From       | To            | Process    | Direction  |
+============+===============+============+===========:+
| Process\_1 | Gene\_1       | Process\_1 | NA         |
+------------+---------------+------------+------------+
| Gene\_1    | Transcript\_1 | Process\_1 | Up         |
+------------+---------------+------------+------------+
| Gene\_1    | Transcript\_2 | Process\_1 | Down       |
+------------+---------------+------------+------------+

# Plotting

`tidygraph` uses a two `tibble` format, one for nodes and one for edges, and displays it as a `tbl_graph` object, using a tidy manner to display both tibbles together.

```r
example_dataset_path <- system.file("extdata", "example_dataset.csv", package = "txGeneNetwork")
example_dataset <- read_csv(example_dataset_path)

The tbl_graph() command allows you to directly create a tbl_graph object using our .csv table.

example_tbl_graph <- as_tbl_graph(example_dataset)
example_tbl_graph

Modified ggplot2 syntax

Now that we have the tbl_graph object we can start plotting the data. ggraph uses a syntax very similar to ggplot2 and most of the addons used in ggplot2 can also be used in ggraph, like theme_*() from ggthemes and geom_*_repel() from ggrepel.

To start, we will construct the basic network using the example data and add extra information for nodes and edges.

We will use geom_node_point() and geom_edge_link() for the basic network.

example_tbl_graph %>%
  ggraph() +
  geom_node_point() +
  geom_edge_link()

We got a message, saying that ggraph used sugiyama as the default layout, which can be changed by passing an argument to the ggraph() function call.

example_tbl_graph %>%
  ggraph(layout = "kk") +
  geom_node_point() +
  geom_edge_link()

Now we have a network more similar to the final product. To modify our tbl_graph object and add other variables you can use usual dplyr syntax together with the activate() function. The activate() verb will select which of the tibbles you are modifying the nodes or the edges tibble. Here we add a centrality measure [^1] to the network and size it accordingly using an aes() call inside geom_node_point().

example_tbl_graph %>%
  mutate(centrality = centrality_power()) %>%
  ggraph(layout = "kk") +
  geom_node_point(aes(size = centrality)) +
  geom_edge_link()

We can also color the edges according to the process they belong to or the direction of the transcript expression, using a similar syntax, but now adding an aes() call inside geom_edge_link().

example_tbl_graph %>%
  mutate(centrality = centrality_power()) %>%
  ggraph(layout = "kk") +
  geom_edge_link(aes(col = Direction)) +
  geom_node_point(aes(size = centrality))
example_tbl_graph %>%
  mutate(centrality = centrality_power()) %>%
  ggraph(layout = "kk") +
  geom_edge_link(aes(color = Group)) +
  geom_node_point(aes(size = centrality))

As there are genes which belong to more than one biological process, this is not an adequate process visualization, the best way would be plotting it as individual hulls, but we will get to that down the workflow.

Adding information on the nodes

For now, in our example table, we only added aesthetics to the edges. Now we will add the transcript_type and the hull aesthetic. First, you can extract the nodes table to then modify it using

example_tbl_graph %>% 
  activate(nodes) %>%
  as_tibble()

Saving this on an object allows you to save and modify at will your nodes table. Here we load the modified table version.

modified_nodes_path <- system.file("extdata", "modified_nodes.csv", package = "txGeneNetwork")

modified_nodes <- read_csv(modified_nodes_path)

We now add the transcript modified information in our tbl_graph object and plot it using the aes(col)

example_tbl_graph %>%
  activate(nodes) %>%
  mutate(Type = modified_nodes$Type) %>%
  mutate(centrality = centrality_power()) %>%
  ggraph(layout = "kk") +
  geom_edge_link(aes(col = Direction)) +
  geom_node_point(aes(size = centrality, color = Type))

The geom_mark_hull(), the function to add the hull colors indicating biological processes, does not work well with the tbl_graph object due to not being able to add multiple information for the same node in the same color. So the best way to color hulls is to add extra columns representing these overlays and do one geom_mark_hull() call for each.

example_tbl_graph %>%
  activate(nodes) %>%
  mutate(
    Type = modified_nodes$Type,
    Process = modified_nodes$Process_1,
    Process_2 = modified_nodes$Process_2,
    Process_3 = modified_nodes$Process_3
  ) %>%
  mutate(centrality = centrality_power()) %>%
  ggraph(layout = "kk") +
  geom_mark_hull(aes(x = x, y = y, fill = Process, color = Process)) +
  geom_mark_hull(aes(x = x, y = y, fill = Process_2, color = Process_2)) +
  geom_mark_hull(aes(x = x, y = y, fill = Process_3, color = Process_3)) +
  geom_edge_link(aes(col = Direction)) +
  new_scale("color") +
  geom_node_point(aes(size = centrality, color = Type)) +
  theme_graph()

Unfortunately, there is no way to plot this without the NA values due to the tbl_graph class and how the geom_mark_hull function works, so the NA hulls have to be removed a posteriori.

Final touches

Now some final touches like legend size and title.

example_tbl_graph %>%
  activate(nodes) %>%
  mutate(
    Type = modified_nodes$Type,
    Process = modified_nodes$Process_1,
    Process_2 = modified_nodes$Process_2,
    Process_3 = modified_nodes$Process_3
  ) %>%
  mutate(centrality = centrality_power()) %>%
  ggraph(layout = "kk") +
  geom_mark_hull(aes(x = x, y = y, fill = Process, color = Process)) +
  geom_mark_hull(aes(x = x, y = y, fill = Process_2, color = Process_2)) +
  geom_mark_hull(aes(x = x, y = y, fill = Process_3, color = Process_3)) +
  geom_edge_link(aes(col = Direction)) +
  new_scale("color") +
  geom_node_point(aes(size = centrality, color = Type))

Now you have the final network, and you only need to save it as .pdf or .csv and remove the NA layer of the hull.

Session information

sessionInfo()
# sessioninfo::session_info()
# xfun::session_info()

References

[^1]: add a reference about network statistics



villabioinfo/txGeneNetwork documentation built on Nov. 12, 2020, 11:28 a.m.