In neurodata/graphstats: Graph Statistics

require(fmriutils)
require(graphstats)
require(mgc)
require(ggplot2)
require(latex2exp)
require(igraph)
require(stringr)
require(gridExtra)
require(scales)
require(data.table)
require(grid)
source('../../R/siem.R')

parse_class <- function(basepath, dsets, subjects) {
  sex = list()
  # disease = list()
  age = list()
  include = c()
  for (dset in dsets) {
    path_to_file = file.path(basepath, paste(dset, "_phenotypic_data.csv", sep=""))
    tryCatch({
      tab = read.csv(path_to_file)
      tab$SEX[tab$SEX == '#' | is.na(tab$SEX) | is.nan(tab$SEX)] = NaN
      if (dset == 'KKI2009') {
        sexm <- 'M'
      } else if (dset == 'Templeton114') {
        sexm <- 1
      } else {
        sexm <- 2
      }
      tab$AGE_AT_SCAN_1 <- factor(tab$AGE_AT_SCAN_1)
      tab$SEX = tab$SEX == sexm
      tab = tab[complete.cases(tab$SEX),]
      tab$AGE_AT_SCAN_1 = as.numeric(levels(tab$AGE_AT_SCAN_1))[tab$AGE_AT_SCAN_1]
      for (idx in 1:dim(tab)[1]) {
        subid = toString(tab[idx,]$SUBID)
        sex[[subid]] = tab[idx,]$SEX
        age[[subid]] = tab[idx,]$AGE_AT_SCAN_1
        # disease[[subid]] = tab[idx,]$DSM_IV_TR
      }
    }, error=function(e) return(NaN))
  }

  sclass = array(NaN, dim=c(length(subjects)))
  ageclass = sclass
  # diseaseclass = sclass
  for (i in 1:length(subjects)) {
    subject = subjects[i]
    subid = sub('^0+(?=[1-9])', '', str_extract(subject, '(?<=sub-).*'), perl=TRUE)
    idx = which(names(sex) == subid)
    if (length(idx) >= 1) {
      sclass[i] <- tryCatch(sex[[subid]], error=NaN)
      ageclass[i] <- tryCatch(age[[subid]], error=NaN)
      # diseaseclass[i] <- disease[[subid]]
    }
  }
  return(list(sex=sclass, age=ageclass))#, disease=diseaseclass))
}

Model

For details on the model, please see SIEM model. For an expanded look at the Ipsilateral vs Contralateral Problem, see Hemispheric Connectivity Notebook.

Real Data Experiments

In this notebook, we determine whether there exists a batch effect in the difference in connectivity $\bar{r}{ipsi}$ homotopically (same region, opposite hemisphere) vs $\bar{r}{contra}$ heterotopically (different region) connectivity within a particular modality. We consider $\delta_{x} = \bar{r}{ipsi} - \bar{r}{contra}$ to be the difference in connectivity for a graph from a graph or collection of graphs from a particular modality $x$. Here, $\bar{r}$ is the average rank within a particular region.

Tests

for each Graph

Our test for this notebook is as follows:

\begin{align} H_0: p_{ips} \leq p_{contra} \ H_A: p_{ips} > p_{contra} \end{align}

We will use a $1$-sample test to determine whether within a single graph we can determine a significant difference in connectivity.

We will perform this experiment for both dMRI and fMRI-derived connectomes.

for Determining a Batch

To determine a batch's test statistic, we will simply attempt to assess the magnitude of the differences in model fit between each pair of batches. To acquire a p-value, we will use a permutation-based approach. That is:

test_statistic(models, grouping):
  compute the average p_{ipsi} and p_{contra} for the graphs within each unique grouping label.
  For each i, j pair of unique grouping labels:
    stat[i, j] = |p_{ipsi, 1} - p_{ipsi, 2} - (p_{contra, 1} - p_{contra, 2})|
  return max(stats)  # test statistic is the pairing with the greatest magnitude

permutation_test(models, batch_grouping):
  compute the test statistic for the graphs given the default batch grouping.
  For i in 1:nrepetitions:
    permute the batch grouping to obtain a permuted grouping.
    compute the test statistic for the models given the permuted grouping.
  Compute the fraction of permuted test statistics > the given test statistic.

Or the fraction of times that the maximum magnitude of a model fit difference exceeds the observed maximum magnitude of a model fit difference.

get_all_results <- function(tstat.out) {
  P <- tstat.out$P
  if (length(P) > 1) {
    return(P[upper.tri(P, diag=FALSE)])
  } else if (length(P) == 1) {
    return(P)
  } else {
    return(NaN)
  }
}

persub_results <- function(tstat.out) {
  P <- tstat.out$P
  return(apply(P, 1, mean))
}

case2.perm <- function(models, Z, i=1, j=2, ...) {
  incl <- which(sapply(models, is.defined))
  Z <- Z[incl]
  Zset <- unique(Z)
  models <- models[incl]
  p <- sapply(models, function(model) model$pr[i])
  q <- sapply(models, function(model) model$pr[j])

  P <- array(NaN, dim=c(length(Zset)))
  D <- array(NaN, dim=c(length(Z), length(Z)))
  for (i in 1:length(models)) {
    for (j in 1:length(models)) {
      D[i, j] <- abs(p[i] - p[j] - (q[i] - q[j]))
    }
  }
  diag(D) <- NaN  # ignore self connections

  for (i in 1:length(Zset)) {
    sis <- which(Z == Zset[i])
    sjs <- which (Z != Zset[i])
    alt <- mean(D[sis, sis], na.rm=TRUE)
    null <- D[sis, sjs]
    P[i] <- mean(c(alt > null))
  }

  return(list(P=P))
}

case.experiment.within <- function(models, datasets.labs, split.labs, modal, nrep=1000,
                                   simplify=get_all_results, stat=gs.siem.batch.perm) {
  # partition the models that do not have the split data
  include_sets <- which(!is.nan(split.labs))
  incl.models <- models[include_sets]
  incl.datasets.labs <- datasets.labs[include_sets]
  incl.split.labs <- split.labs[include_sets]

  dsets <- unique(incl.datasets.labs)
  results <- data.frame(dataset=c(), pval=c(), modal=c(), size=c())

  for (i in 1:length(dsets)) {
    ss <- which(incl.datasets.labs == dsets[i])
    model_ss <- incl.models[ss]  # subset one dataset of models
    split_ss <- incl.split.labs[ss]
    if (length(unique(split_ss)) > 1) {
      perm.result <- do.call(stat, list(model_ss, split_ss, i=1, j=2, nperm=nrep))
      pv <- do.call(simplify, list(perm.result))
      results <- rbind(results, data.frame(dataset=dsets[i], pval=pv, modal=modal,
                                           size=length(ss)))
    }
  }
  return(results)
}

case.experiment.between <- function(models, datasets.labs, modal, nrep=1000, simplify=get_all_results,
                                    stat=gs.siem.batch.perm) {
  # partition the models that do not have the split data
  include_sets <- which(!is.nan(datasets.labs))
  models <- models[include_sets]
  datasets.labs <- datasets.labs[include_sets]

  dsets <- unique(datasets.labs)
  results <- data.frame(dataset=c(), pval=c(), modal=c(), size=c())

  perm.result <- do.call(stat, list(model_ss, split_ss, i=1, j=2, nperm=nrep))
  pv <- do.call(simplify, list(perm.result))
  results <- rbind(results, data.frame(dataset='set', pval=pv, modal=modal, size=length(ss)))
  return(results)
}

case.experiment.pairwise <- function(models, datasets.labs, modal, nrep=1000, simplify=get_all_results,
                                     stat=gs.siem.batch.perm) {
  # partition the models that do not have the split data
  include_sets <- which(!is.nan(datasets.labs))
  models <- models[include_sets]
  datasets.labs <- datasets.labs[include_sets]

  dsets <- unique(datasets.labs)
  results <- data.frame(dset1=c(), dset2=c(), pval=c(), size=c(), modality=c())
  D <- array(0, dim=c(length(dsets), length(dsets)))
  for (i in 1:(length(dsets) - 1)) {
    ss1 <- datasets.labs %in% dsets[i]
    for (j in ((i+1):length(dsets))) {
      ss <- datasets.labs %in% c(dsets[i], dsets[j])
      model_ss <- models[ss]
      split_ss <- datasets.labs[ss]
      perm.result <- do.call(stat, list(model_ss, split_ss, i=1, j=2, nperm=nrep))
      pv <- do.call(simplify, list(perm.result))
      results <- rbind(results, data.frame(dset1=dsets[i], dset2=dsets[j], pval=pv,
                                           modal=modal, size=sum(ss1)))
      results <- rbind(results, data.frame(dset1=dsets[j], dset2=dsets[i], pval=pv,
                                           modal=modal, size=sum(ss1)))
      D[i, j] <- pv
    }
  }
  D <- D + t(D) - diag(diag(D))
  diag(D) <- 1  # no batch between same dataset
  return(list(data=results, D=D, dsets=dsets))
}

g_legend <- function(a.gplot){
  tmp <- ggplot_gtable(ggplot_build(a.gplot))
  leg <- which(sapply(tmp$grobs, function(x) x$name) == "guide-box")
  legend <- tmp$grobs[[leg]]
  return(legend)
}

is.defined = function(x)!is.null(x)

extract_params <- function(models, Z, i=1, j=2, modal=NULL) {
  # aggregate p and qs
  params <- lapply(unique(Z), function(z) {
    ss <- which(Z == z)
    mods <- dwi.models[ss]
    p <- sapply(mods, function(model) model$pr[i])
    q <- sapply(mods, function(model) model$pr[j])
    # pm <- mean(p); qm <- mean(q); dm <- pm - qm
    d <- p - q
    return(data.frame(p=p, q=q, d=d, dataset=z, size=length(ss), modal=modal))
  })
  df <- do.call(rbind, params)
  return(df)
}

Raw Data

The data below can be downloaded and moved to appropriate folders as follows (note that the below section requires sudo access):

sudo mkdir /data/
sudo chmod -R 777 /data

cd /data
wget http://openconnecto.me/mrdata/share/derivatives/dwi_edgelists.tar.gz
wget http://openconnecto.me/mrdata/share/derivatives/fmri_edgelists.tar.gz
wget http://openconnecto.me/mrdata/share/connectome_stats/connectome_stats.zip

mkdir -p /data/connectome_stats /data/all_mr /data/all_mr/dwi/edgelists /data/all_mr/fmri/ranked/edgelists
mv dwi_edgelists.tar.gz /data/dwi/edgelists
cd /data/dwi/edgelists
tar -xvzf dwi_edgelists.tar.gz
mv /data/fmri_edgelists.tar.gz /data/fmri/ranked/edgelists
cd /data/fmri/ranked/edgelists
tar -xvzf fmri_edgelists.tar.gz
mv /data/connectome_stats.zip /data/connectome_stats.zip
cd /data/connectome_stats
unzip connectome_stats.zip

Diffusion

nroi <- 70
dwi.dsets = c('BNU1', 'BNU3', 'HNU1', 'KKI2009', 'NKI1', 'NKIENH', 'MRN1313', 'Templeton114', 'Templeton255', 'SWU4')
dwi.atlas = 'desikan'
dwi.basepath = '/data/all_mr/dwi/edgelists'

graphobj = fmriu.io.collection.open_graphs(basepath = dwi.basepath, atlases = dwi.atlas, datasets = dwi.dsets,
                                           gname = 'graphs', fmt='edgelist', rtype = 'array')
dwi.graphs = graphobj$graphs
dwi.datasets = graphobj$dataset
dwi.subjects = graphobj$subjects
dwi.sessions = graphobj$sessions

sexpath = '/data/all_mr/phenotypic/'
class = parse_class(sexpath, dwi.dsets, dwi.subjects)
dwi.sexs = class$sex

ne = 1225
nroi <- 70
group1 <- c()  # edges in same hemisphere
group2 <- c()  # edges across hemispheres
for (i in 1:nroi) {
  for (j in (i+1):nroi) {
    idx <- (i - 1)*nroi + j
    if ((i <= 35 & j <= 35) | (i > 35 & j > 35)) {
      group1 <- c(group1, idx)
    } else {
      group2 <- c(group2, idx)
    }
  }
}
Es <- list(group1, group2)
dwi.rank.graphs <- gs.xfm.aaply(dwi.graphs, gs.xfm.rank_graph)
dwi.models <- suppressWarnings(gs.xfm.alply(dwi.rank.graphs, gs.siem.fit, Es))
incl <- which(sapply(dwi.models, is.defined))
dwi.datasets <- dwi.datasets[incl]
dwi.sessions <- dwi.sessions[incl]
dwi.sexs <- dwi.sexs[incl]
dwi.subjects <- dwi.subjects[incl]
dwi.models <- dwi.models[incl]

Functional

nroi <- 70
fmri.dsets = c('BNU1', 'BNU2', 'BNU3', 'HNU1', 'IBATRT', 'IPCAS1', 'IPCAS2', 'IPCAS5', 'IPCAS6', 'IPCAS8', 'MRN1', 'NYU1', 'SWU1', 'SWU2', 'SWU3', 'SWU4', 'UWM', 'XHCUMS')
fmri.atlas = 'desikan-2mm'
fmri.basepath = '/data/all_mr/fmri/ranked/edgelists/'

graphobj = fmriu.io.collection.open_graphs(basepath = fmri.basepath, atlases = fmri.atlas, datasets=fmri.dsets, fmt='edgelist', rtype = 'array')
fmri.graphs = graphobj$graphs
fmri.datasets = graphobj$dataset
fmri.subjects = graphobj$subjects
fmri.sessions <- graphobj$sessions

sexpath = '/data/all_mr/phenotypic/'
class = parse_class(sexpath, fmri.dsets, fmri.subjects)
fmri.sexs = class$sex

ne = 1225
nroi <- 70
group1 <- c()  # edges in same hemisphere
group2 <- c()  # edges across hemispheres
for (i in 1:nroi) {
  for (j in i:nroi) {
    idx <- (i - 1)*nroi + j
    if ((i <= 35 & j <= 35) | (i > 35 & j > 35)) {
      group1 <- c(group1, idx)
    } else {
      group2 <- c(group2, idx)
    }
  }
}
Es <- list(group1, group2)
fmri.rank.graphs <- gs.xfm.aaply(fmri.graphs, gs.xfm.rank_graph)
fmri.models <- suppressWarnings(gs.xfm.alply(fmri.rank.graphs, gs.siem.fit, Es))

incl <- which(sapply(fmri.models, is.defined))
fmri.datasets <- fmri.datasets[incl]
fmri.sessions <- fmri.sessions[incl]
fmri.sexs <- fmri.sexs[incl]
fmri.subjects <- fmri.subjects[incl]
fmri.models <- fmri.models[incl]

Map a color vector:

total.dsets <- union(dwi.datasets, fmri.datasets)
total.datasets <- c(dwi.datasets, fmri.datasets)
svals <- c()
for (i in 1:length(total.dsets)) {
  if (total.dsets[i] %in% fmri.datasets) {
    size <- sum(fmri.datasets == total.dsets[i])
  } else {
    size <- sum(dwi.datasets == total.dsets[i])
  }
  svals[i] <- size
}
cols <- c("#d5512b","#5970d8","#99b534","#9e5cd0","#59b648","#c74bae","#4bbe7c","#d93f7f",
          "#387e4a","#d2414f","#4ebdb0","#d98d2f","#5e98d3","#c6ab43","#725a9d","#99b36d",
          "#b98ed8","#677629","#d77fb9","#92692e","#9c4467","#e1966c","#dc7b88","#ac5336")
names(svals) <- total.dsets
names(cols) <- total.dsets

Pre-Experimental: Visualizing P and Q across Studies

dwi.params <- extract_params(dwi.models, dwi.datasets, i=1, j=2, modal='dMRI')
fmri.params <- extract_params(fmri.models, fmri.datasets, i=1, j=2, modal='fMRI')
params <- rbind(dwi.params, fmri.params)
colnames(params)[3] <- "p - q"
saveRDS(params, 'results_hem_params.rds')

params <- readRDS('results_hem_params.rds')
# params <- melt(params, id=c("dataset", "modal", "size"))
# plot_pre <- ggplot(params, aes(x=modal, y=value, shape=variable, size=size, color=dataset)) +
#   geom_jitter() +
#   scale_shape_discrete(breaks = c("p", "q", "p - q")) +
#   xlab("Modality") +
#   ylab("Value") +
#   theme_bw() +
#   guides(size=FALSE) +
#   ggtitle("Exploratory Analysis of Within-Dataset Rank") +
#   labs(shape="Variable", color="Dataset")
plot_pre <- ggplot(params, aes(x=p, y=q, shape=modal, size=size, color=dataset)) +
  geom_point(alpha=1) +
  scale_color_manual(name="Dataset", values=cols) +
  scale_size_continuous(name="Dataset", breaks=svals) +
  xlab(TeX("$\\hat{p}$")) +
  ylab(TeX("$\\hat{q}$")) +
  theme_bw() +
  ggtitle("Exploratory Analysis") +
  labs(shape="Modality", color="Dataset")

Case 1: Session Partitioning

Is there a batch effect between sessions of a particular study?

Procedure

1) Compute test statistic given the default partitioning, and obtain $\tau_{observed}$, as the maximum magnitude of difference in ipsilateral vs. contralateral connectivity between any pair of 2 sessions. 2) permute the set labels of the combined set $nperm$ times (maintaining sex of each graph) to obtain the distribution of $\tau_{null}$, reporting $\hat{p}$, the estimator of $\mathbb{E}\left[\tau_{null} < \tau_{observed}\right]$

results1 <- rbind(case.experiment.within(dwi.models, dwi.datasets, dwi.sessions, 'dMRI', nrep=1000),
      case.experiment.within(fmri.models, fmri.datasets, fmri.sessions, 'fMRI', nrep=1000))
saveRDS(results1, 'results_hem_case1.rds')

results1 <- readRDS('results_hem_case1.rds')
plot1 <- ggplot(results1, aes(x=modal, y=pval, shape=modal, color=dataset, size=size)) +
  geom_jitter(alpha=0.8, width=0.25, height=0) +
  scale_color_manual(name="Dataset", values=cols) +
  scale_size_continuous(name="Dataset", breaks=svals) +
  ggtitle("Case 1: Sessions") +
  ylab(TeX("$p$-Value")) +
  xlab("Modality") +
  scale_y_continuous(trans=log10_trans(), limits=c(.001, 1), breaks=c(.001, .001, .01, .1, 1)) +
  labs(shape="Modality", color="Dataset") +
  theme_bw()

Case 2: Subject Partitioning

Is there a batch effect between subjects in a particular study?

Procedure

1) Compute test statistic given the default partitioning, and obtain $\tau_{observed}$, as the maximum magnitude of difference in ipsilateral vs. contralateral connectivity between any pair of 2 subjects. 2) permute the set labels of the combined set $nperm$ times (maintaining sex of each graph) to obtain the distribution of $\tau_{null}$, reporting $\hat{p}$, the estimator of $\mathbb{E}\left[\tau_{null} < \tau_{observed}\right]$

results2 <- rbind(case.experiment.within(dwi.models, dwi.datasets, dwi.subjects, 'dMRI', nrep=1000,
                                         simplify=persub_results, stat=case2.perm),
      case.experiment.within(fmri.models, fmri.datasets, fmri.subjects, 'fMRI', nrep=1000,
                             simplify=persub_results, stat=case2.perm))
saveRDS(results2, 'results_hem_case2.rds')

results2 <- readRDS('results_hem_case2.rds')
dwi.plot2 <- ggplot(results2[results2$modal == 'dMRI',], aes(pval, color=dataset)) +
  geom_density(alpha=0.2, size=1.5) +
  scale_color_manual(name="Dataset", values=cols) +
  xlab("") +
  coord_flip() +
  ylab("") +
  theme_bw() +
  scale_x_continuous(trans=log10_trans(), limits=c(.001, 1), breaks=c(.001, .001, .01, .1, 1))

fmri.plot2 <- ggplot(results2[results2$modal == 'fMRI',], aes(pval, color=dataset)) +
  geom_density(alpha=0.2, size=1.5) +
  scale_color_manual(name="Dataset", values=cols) +
  xlab("") +
  coord_flip() +
  ylab("") +
  theme_bw() +
  scale_x_continuous(trans=log10_trans(), limits=c(.001, 1), breaks=c(.001, .001, .01, .1, 1))

plot2 <- grid.arrange(dwi.plot2 + theme(legend.position=NaN) + ylab("dMRI"),
                      fmri.plot2 + theme(legend.position=NaN) + ylab("fMRI"),
                      nrow=1, top = textGrob("Case 2: Subjects",gp=gpar(fontsize=14)))

Case 3: Same Sex

Is there a batch effect between sexs in a particular study?

Procedure

1) Compute test statistic given the default partitioning, and obtain $\tau_{observed}$, as the maximum magnitude of difference in ipsilateral vs. contralateral connectivity between any pair of 2 subjects. 2) permute the set labels of the combined set $nperm$ times (maintaining sex of each graph) to obtain the distribution of $\tau_{null}$, reporting $\hat{p}$, the estimator of $\mathbb{E}\left[\tau_{null} < \tau_{observed}\right]$

results3 <- rbind(case.experiment.within(dwi.models, dwi.datasets, dwi.sexs, 'dMRI', nrep=1000),
      case.experiment.within(fmri.models, fmri.datasets, fmri.sexs, 'fMRI', nrep=1000))
saveRDS(results3, 'results_hem_case3.rds')

results3 <- readRDS('results_hem_case3.rds')
plot3 <- ggplot(results3, aes(x=modal, y=pval, shape=modal, group=dataset, color=dataset, size=size)) +
  geom_jitter(alpha=0.8, width=0.25, height=0) +
  scale_color_manual(name="Dataset", values=cols) +
  scale_size_continuous(name="Dataset", breaks=svals) +
  ggtitle("Case 3: Same Sex") +
  xlab("") +
  ylab("") +
  scale_y_continuous(trans=log10_trans(), limits=c(.001, 1), breaks=c(.001, .001, .01, .1, 1)) +
  theme_bw()

Case 4: Same Site

Is there a batch effect between studies with site held fixed?

Procedure

1) Compute test statistic given the default partitioning, and obtain $\tau_{observed}$, as the maximum magnitude of difference in ipsilateral vs. contralateral connectivity between any pair of 2 subjects. 2) permute the set labels of the combined set $nperm$ times (maintaining sex of each graph) to obtain the distribution of $\tau_{null}$, reporting $\hat{p}$, the estimator of $\mathbb{E}\left[\tau_{null} < \tau_{observed}\right]$

dwi.studies <- dwi.datasets
fmri.studies <- fmri.datasets
dwi.studies[dwi.studies == 'BNU1' | dwi.studies == 'BNU3'] <- 'BNU'
fmri.studies[fmri.studies == 'BNU1' | fmri.studies == 'BNU3'] <- 'BNU'
fmri.studies[fmri.studies == 'BNU1'] <- 'BNU'
fmri.studies[grep('SWU', fmri.studies)] <- 'SWU'
fmri.studies[grep('IPCAS', fmri.studies)] <- 'IPCAS'
dwi.studies[dwi.studies == 'Templeton114' | dwi.studies == 'Templeton255'] <- 'Templeton'

results4 <- rbind(case.experiment.within(dwi.models, dwi.studies, dwi.datasets, 'dMRI', nrep=1000),
      case.experiment.within(fmri.models, fmri.studies, fmri.datasets, 'fMRI', nrep=1000))
saveRDS(results4, 'results_hem_case4.rds')

results4 <- readRDS('results_hem_case4.rds')
plot4 <- ggplot(results4, aes(x=modal, y=pval, shape=modal, color=dataset, group=dataset)) +
  geom_jitter(alpha=1, size=4, width=0.5, height=0) +
  ggtitle("Case 4: Same Site") +
  xlab("") +
  ylab("") +
  scale_color_manual(name="Dataset", values=cols) +
  scale_y_continuous(trans=log10_trans(), limits=c(.001, 1), breaks=c(.001, .001, .01, .1, 1)) +
  labs(color="Site") +
  guides(shape=FALSE) +
  theme_bw()

Case 5: Same Demographics

Is there a batch effect between studies with demographic held fixed?

Procedure

1) Compute test statistic given the default partitioning, and obtain $\tau_{observed}$, as the maximum magnitude of difference in ipsilateral vs. contralateral connectivity between any pair of 2 subjects. 2) permute the set labels of the combined set $nperm$ times (maintaining sex of each graph) to obtain the distribution of $\tau_{null}$, reporting $\hat{p}$, the estimator of $\mathbb{E}\left[\tau_{null} < \tau_{observed}\right]$

dwi.dset.sset <- which(dwi.datasets %in% c('BNU1', 'SWU4', 'HNU1', 'BNU3'))
dwi.models.subs  <- dwi.models[dwi.dset.sset]; dwi.datasets.subs <- dwi.datasets[dwi.dset.sset]

fmri.dset.sset <- which(fmri.datasets %in% c('BNU1', 'SWU4', 'HNU1', 'BNU3'))
fmri.models.subs  <- fmri.models[fmri.dset.sset]; fmri.datasets.subs <- fmri.datasets[fmri.dset.sset]

results5 <- rbind(case.experiment.pairwise(dwi.models.subs, dwi.datasets.subs, 'dMRI', nrep=1000)$data,
      case.experiment.pairwise(fmri.models.subs, fmri.datasets.subs, 'fMRI', nrep=1000)$data)
saveRDS(results5, 'results_hem_case5.rds')

results5 <- readRDS('results_hem_case5.rds')
plot5 <- ggplot(results5, aes(x=modal, y=pval, shape=modal, group=dset1, color=dset1, size=size)) +
  geom_jitter(alpha=0.8, width=0.25, height=0) +
  scale_color_manual(name="Dataset", values=cols) +
  scale_size_continuous(name="Dataset", breaks=svals) +
  ggtitle("Case 5: Same Demographics") +
  xlab("") +
  ylab("") +
  scale_y_continuous(trans=log10_trans(), limits=c(.001, 1), breaks=c(.001, .001, .01, .1, 1)) +
  theme_bw()

Case 6: Disparate Demographics

Is there a batch effect between studies without demographic held fixed?

Procedure

1) Compute test statistic given the default partitioning, and obtain $\tau_{observed}$, as the maximum magnitude of difference in ipsilateral vs. contralateral connectivity between any pair of 2 subjects. 2) permute the set labels of the combined set $nperm$ times (maintaining sex of each graph) to obtain the distribution of $\tau_{null}$, reporting $\hat{p}$, the estimator of $\mathbb{E}\left[\tau_{null} < \tau_{observed}\right]$

results6 <- rbind(case.experiment.pairwise(dwi.models, dwi.datasets, 'dMRI', nrep=1000)$data,
      case.experiment.pairwise(fmri.models, fmri.datasets, 'fMRI', nrep=1000)$data)
saveRDS(results6, 'results_hem_case6.rds')

results6 <- readRDS('results_hem_case6.rds')
plot6 <-  ggplot(results6, aes(x=modal, y=pval, shape=modal, group=dset1, color=dset1, size=size)) +
  geom_jitter(alpha=0.8, width=0.25, height=0) +
  scale_color_manual(name="Dataset", values=cols) +
  scale_size_continuous(name="Dataset", breaks=svals) +
  ggtitle("Case 6: Disparate Demographics") +
  xlab("") +
  ylab("") +
  scale_y_continuous(trans=log10_trans(), limits=c(.001, 1), breaks=c(.001, .001, .01, .1, 1)) +
  theme_bw()

Pairwise Study Comparison

Is there a batch effect between all pairs of studies?

Procedure

1) Compute test statistic given the default partitioning, and obtain $\tau_{observed}$, as the maximum magnitude of difference in ipsilateral vs. contralateral connectivity between any pair of 2 subjects. 2) permute the set labels of the combined set $nperm$ times (maintaining sex of each graph) to obtain the distribution of $\tau_{null}$, reporting $\hat{p}$, the estimator of $\mathbb{E}\left[\tau_{null} < \tau_{observed}\right]$

dwi_pair_results <- case.experiment.pairwise(dwi.models, dwi.datasets, 'dMRI', nrep=1000)
fmri_pair_results <- case.experiment.pairwise(fmri.models, fmri.datasets, 'fMRI', nrep=1000)
results_pair <- list(dwi_pair_results, fmri_pair_results)
saveRDS(results_pair, 'results_hem_pair.rds')

Results

Cases

results_pair <- readRDS('results_hem_pair.rds')
dwi_pair_results <- results_pair[[1]]
fmri_pair_results <- results_pair[[2]]
top_leg <- g_legend(plot_pre)
bot_leg <- g_legend(plot4)
bottom <- grid.arrange(arrangeGrob(plot1 + theme(legend.position=NaN),
                         plot2,
                         plot3 + theme(legend.position=NaN),
                         plot4 + theme(legend.position=NaN),
                         plot5 + theme(legend.position=NaN),
                         plot6 + theme(legend.position=NaN), nrow=2),
             arrangeGrob(top_leg, bot_leg, nrow=2), nrow=1, widths=c(0.82, 0.16))

Pairwise Comparisons

dMRI

dwi_pair_results$data$dset1 <- ordered(dwi_pair_results$data$dset1, levels=names(svals))
dwi_pair_results$data$dset2 <- ordered(dwi_pair_results$data$dset2, levels=names(svals))
dwi.pair <- ggplot(dwi_pair_results$data, aes(x=dset1, y=dset2, fill=pval)) +
        geom_tile() +
        ggtitle("dMRI Paired Batch") +
        xlab("Dataset") +
        ylab("Dataset") +
        scale_fill_gradientn(name=TeX("$p$-value"), trans="log", breaks=c(0.001, 0.01, 0.1, 1),
                             colours=c("#f2f0f7", "#cbc9e2", "#9e9ac8", "#6a51a3"), limits=c(0.001, 1)) +
        theme(axis.text.x = element_text(angle = 90, hjust = 1))

fMRI

fmri_pair_results$data$dset1 <- ordered(fmri_pair_results$data$dset1, levels=names(svals))
fmri_pair_results$data$dset2 <- ordered(fmri_pair_results$data$dset2, levels=names(svals))
fmri.pair <- ggplot(fmri_pair_results$data, aes(x=dset1, y=dset2, fill=pval)) +
        geom_tile() +
        ggtitle("fMRI Paired Batch") +
        xlab("Dataset") +
        ylab("Dataset") +
        scale_fill_gradientn(name=TeX("$p$-value"), trans="log", breaks=c(0.001, 0.01, 0.1, 1),
                             colours=c("#f2f0f7", "#cbc9e2", "#9e9ac8", "#6a51a3"), limits=c(0.001, 1)) +
        theme(axis.text.x = element_text(angle = 90, hjust = 1))

Combined plot

pair_leg <- g_legend(dwi.pair)
top <- grid.arrange(plot_pre + theme(legend.position=NaN),
                     grid.arrange(dwi.pair + theme(legend.position=NaN),
                                  fmri.pair + theme(legend.position=NaN),
                                  pair_leg, widths=c(0.4, 0.4, 0.2)),
                     widths=c(0.2, 0.7), nrow=1)

grid.arrange(top, bottom, heights=c(0.4, 0.6))

neurodata/graphstats documentation built on May 14, 2019, 5:19 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

neurodata/graphstats
Graph Statistics

In neurodata/graphstats: Graph Statistics

Model

Real Data Experiments

Tests

for each Graph

for Determining a Batch

Raw Data

Diffusion

Functional

Pre-Experimental: Visualizing P and Q across Studies

Case 1: Session Partitioning

Procedure

Case 2: Subject Partitioning

Procedure

Case 3: Same Sex

Procedure

Case 4: Same Site

Procedure

Case 5: Same Demographics

Procedure

Case 6: Disparate Demographics

Procedure

Pairwise Study Comparison

Procedure

Results

Cases

Pairwise Comparisons

dMRI

fMRI

Combined plot

R Package Documentation

Browse R Packages

We want your feedback!

neurodata/graphstats Graph Statistics

In neurodata/graphstats: Graph Statistics

Model

Real Data Experiments

Tests

for each Graph

for Determining a Batch

Raw Data

Diffusion

Functional

Pre-Experimental: Visualizing P and Q across Studies

Case 1: Session Partitioning

Procedure

Case 2: Subject Partitioning

Procedure

Case 3: Same Sex

Procedure

Case 4: Same Site

Procedure

Case 5: Same Demographics

Procedure

Case 6: Disparate Demographics

Procedure

Pairwise Study Comparison

Procedure

Results

Cases

Pairwise Comparisons

dMRI

fMRI

Combined plot

R Package Documentation

Browse R Packages

We want your feedback!

neurodata/graphstats
Graph Statistics