knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
options(width = 300)

Initial Idea

The initial idea was to use figures such as graphs or dendograms to represent some somatotopy data. To help me in my research, Gérard gave me some links like this one showing what it was possibly possible to do with the R language depending on the chosen bookshops.

Keeping in mind that we had to stay in the world of the R language, the two packages that came out after some research were ggraph and igraph.

Moreover as it seemed possible for networkD3 to convert a graph built with the igraph library or for ggraph to use the basic functions provided by igraph, I turned to the latter. Moreover I had at that time a much more consequent documentation on igraph than on the two others (Student subscription to the DataCamp site).

My first objective was therefore to construct a Graph representation of the data I had been provided with.

Raw data frame segmentation

The data given to me was in the following form.

library(somar)

For a better readability of the code, the dataframe included in the package initially under the name somar::experimental_data.df will simply be named df.

df <- somar::experimental_data.df

str(df)

This dataframe above can be divided into 4 main parts.

Metadata columns: describing the conditions of the experiment as well as the mean of the related R-values

df[, 1:3]

Correlation coefficients columns: corresponding to the latter conditions

df[, 4:8]

The p-value columns: corresponding to the latter conditions

df[, 14:18]

Other values columns: that Gérard asked me not to take care of (CI columns)

The aim was therefore that each line could be represented by a graph. The construction of basic graphs requires more precisely a sample as follows.

sample_number <- 1

df[sample_number, 4:8]
df[sample_number, 14:18]

It may be interesting later on that the data separation of a dataframe like the latter can be done automatically (respecting a column naming convention and a precise dataframe structure). This is so that the user can have a quick and general idea of each of his graphs in order to make a more precise selection afterwards.

Automatic segmentation of the data frame

Given the structure of this dataframe,

Naming Convention 1

Metadata columns must not contain more than one underscore.
regex_pattern <- "\\_[A-z]+\\_"

meta_df <- df[-grep(regex_pattern, names(df))]
data_df <- df[grep(regex_pattern, names(df))]
meta_df

Naming Convention 2

The name of the data to be extracted must prefix the relevant zone pair. It must also be separated from an underscore.

This makes it possible to first make a list of the various parameters.

library(stringr)
extract_prefix <- function(column_name) {
  unlist(str_split(column_name, "_"))[1]
}

prefix.vec <- unique(sapply(names(data_df), extract_prefix))
prefix.vec

And in a second step to separate the dataframe into dataframes containing only one parameter.

R_regex_pattern <- paste("^", prefix.vec[1], sep="")
CI_regex_pattern <- paste("^", prefix.vec[2], sep="")
P_regex_pattern <- paste("^", prefix.vec[3], sep="")

R_df <- data_df[grep(R_regex_pattern, names(data_df))]
CI_df <- data_df[grep(CI_regex_pattern, names(data_df))]
P_df <- data_df[grep(P_regex_pattern, names(data_df))]

R_df

It is certain that in a specific function, it will be a question of recovering each of the above dataframes with the help of a loop.

It is then possible to gather data about a certain meta parameter as follows. Ex : Penalty 1 and StimType 1 at row 1

row <- 1
R_df_sample <- R_df[row,]
CI_df_sample <- CI_df[row,]
P_df_sample <- P_df[row,]

row.names(R_df_sample) <- "R"
row.names(CI_df_sample) <- "CI"
row.names(P_df_sample) <- "P"

# uniformity of column names to be able to bind the dataframe samples

substract_prefix <- function(column_name, prefix) {
  substr(column_name, nchar(prefix) + 1, nchar(column_name))
}

R_prefix_to_substract <- paste(prefix.vec[1], "_", sep="")
names(R_df_sample) <- sapply(
  X = names(R_df_sample), 
  F = function(column_name) substract_prefix(column_name, R_prefix_to_substract))

CI_prefix_to_substract <- paste(prefix.vec[2], "_", sep="")
names(CI_df_sample) <- sapply(
  X = names(CI_df_sample), 
  F = function(column_name) substract_prefix(column_name, CI_prefix_to_substract))


p_prefix_to_substract <- paste(prefix.vec[3], "_", sep="")
names(P_df_sample) <- sapply(
  X = names(P_df_sample), 
  F = function(column_name) substract_prefix(column_name, p_prefix_to_substract))

dataframe_unit <- rbind(R_df_sample, P_df_sample, CI_df_sample)
dataframe_unit

Finally for a better manipulation of the data in relation to the graphs to be built, I thought it might be interesting to link the metadata and their respective data frame using the data structure of the list.

Penalty <- 1
StimType <- 1

l <- list(Penalty, StimType, dataframe_unit)
names(l) <- c("Penalty", "StimType", "Dataframe")
l

Naming Convention 3

It is necessary to have a last rule for the correct processing of data in the construction of the graph.

Prefix the zones by NnSel for a non-selected zone and by Sel for a selected zone.

Unique Graph Construction

Graph Construction

At this stage, it might then be interesting to construct the graph simply by splitting the column names and giving them to the graph builder.

library(igraph)

names.vec <- names(dataframe_unit)
node_list.vec <- unlist(lapply(names.vec, function(str) { unlist(str_split(str, "_")) }))

g <- graph(node_list.vec, directed = FALSE) %>%
  set_edge_attr("R", value = as.vector(t(dataframe_unit[1,]))) %>%
  set_edge_attr("P", value = as.vector(t(dataframe_unit[2,]))) %>%
  set_edge_attr("CI", value = as.vector(t(dataframe_unit[3,])))

plot(g)

Graph Layout

Gérard's idea in graph construction was to have a layout of the areas resembling the theoretical layout (somatotopy). In order to do this I had to create a new function creating a semi-circular layout. Traditional graph construction layouts do not implement this layout.

layout_in_semicircle <- function(graph) {
  number_of_vertex <- vcount(graph)
  layout_matrix <- matrix(ncol = 2)
  for (i in 1:number_of_vertex) {
    x <- cos((i * pi / number_of_vertex) - (pi / number_of_vertex) / 2)
    y <- sin((i * pi / number_of_vertex) - (pi / number_of_vertex) / 2)
    if (is.na(layout_matrix[1, 1])) {
      layout_matrix[1,] <- c(x, y)
    } else {
      layout_matrix <- rbind(layout_matrix, c(x, y))
    }
  }
  layout_matrix
}

l <- layout_in_semicircle(g)
plot(g, layout = l, asp = 0.4)

Graph nodes order

The next step was then to make the knots respect the "somatotopy order".

columns_name <- as_ids(V(g))

Sels <- columns_name[sapply(columns_name, grepl, pattern="^Sel")]
NnSels <- columns_name[sapply(columns_name, grepl, pattern="^NnSel")]

order_sels <- function(names) {
  prefix <- ""
  if (grepl(names[1], pattern="^S")) {
    prefix <- "Sel"
  } else {
    prefix <- "NnSel"
  }
  normal_names_cols <- substr(names, str_length(prefix) + 1, str_length(names))
  f_cols <- factor(normal_names_cols, ordered = TRUE, levels = somatotopy.fac)
  sequence <- order(f_cols)
  if (prefix == "NnSel") {
    for (i in 1:length(sequence)) {
      sequence[i] <- ((length(sequence) * 2) + 1) - sequence[i]
    }
  }
  sequence
}

sequence <- c(order_sels(Sels), order_sels(NnSels))

g <- permute(g, sequence)

plot(g, layout = l, asp = 0.4)

Graph Aesthetics

The final step was then to ensure that the visual attributes of the graph reflected the attributes that had been assigned to its construction.

Edge Color

color_scale <- colorRampPalette(c("red4", "red1", "yellow", "lawngreen",  "forestgreen"))
precision_scale <- 1000
color_pallet <- color_scale(precision_scale);

edgeColor <- function(r) {
  color_pallet[abs(signif(r, 3)) * precision_scale]
}

E(g)$color <- edgeColor(E(g)$R)

Edge Width

edge_max_width <- 50
edge_min_width <- 1

edgeWidth <- function(r, minWidth, maxWidth) {
  range_of_values <- maxWidth - minWidth
  unit <- range_of_values / precision_scale
  minWidth + unit * abs(signif(r, 2)) * precision_scale
}

E(g)$width <- edgeWidth(E(g)$R, edge_min_width, edge_max_width)

Edge Continuity according to P value

E(g)$lty <- ifelse(E(g)$P > 0.05, "dotted", "solid")

Edge Curve

E(g)$curved <- seq(0, 0.7, length = ecount(g))

Selected - NnSelected Vertex (-prefix)

V(g)$selected <- grepl(as_ids(V(g)), pattern="^S")
V(g)$label <- ifelse(V(g)$selected, substract_prefix(V(g)$name, "Sel"), substract_prefix(V(g)$name, "NnSel"))
V(g)$color <- ifelse(V(g)$selected, "gray60", "gray90")

Vertex Static Attributes

V(g)$size <- 40
V(g)$frame.color <- ifelse(degree(g) > 1, "forestgreen", "gray")
V(g)$frame.width <- 2
V(g)$label.cex <- 2
V(g)$label.family=""
V(g)$label.cex=1.4
V(g)$label.color="black"

plot(g, layout = l, asp = 0.4)

"Legend"

legend("topleft", legend=c("P ≤ 0.05", "P > 0.05"), lty=1:2, cex=1)
legend("topright", c("Selected","Non Selected"), fill=c("gray60", "gray90"), cex=1)
legend(-1.3, 1.3 , title="|R|", c("0", "0.25", "0.5", "0.75", "1"), fill=c("red4", "red1", "yellow", "lawngreen", "forestgreen"), cex=0.8)

Conclusion - version 1.0.0

After much research and testing, I realised that it would be difficult with igraph to have a graph as aesthetically pleasing as the examples I was given at the beginning. It might then be interesting to see, if desired, how to translate these different structures with ggraph for example. As far as the caption is concerned, I think it will be more interesting for the researcher to add them later on top of the png image generated by igraph. The solutions given by plot and igraph being rather limited.

Finally, I would not have had the time to work on the construction of a dendogram so much the handling of the R language, of the various libraries and the development time of this first version will have taken me some time.



JeremyGillard/somar documentation built on March 19, 2021, 10:50 a.m.