knitr::opts_chunk$set(cache=TRUE, fig.height=4, fig.width=6)

In this page, we will cover much of the basic functionality of the fmriutils package, including loading, processing, visualizing, and manipulating data.

Loading Data

BIDS Spec

When working with data, it is often advantageous to apply a standard specification, including directory organization, naming convention, required data, etc. For the purposes of simplicity, we will use the BIDs specification, a spec designed for magnetic-resonance derivatives of multiple modalities. The general format of files is: sub-[####]_task-[abcd...]_ses-[####]_detailed_info.ftype

In this section, we will cover the basics of loading data. For this tutorial, we have made some preformatted input data available using the below commands:

# from terminal
cd ~/
# prepare a directory for demo data
mkdir data/
cd data/
wget http://openconnecto.me/mrdata/share/demo_data/R-pkgs/BIDs_ts_graphs/BIDs_outputs_ndmg.tar.gz
tar -xvzf BIDs_outputs_ndmg.tar.gz  # unzips and untars contents

which has the following hierarchy:

├── BNU1
│   ├── connectomes
│   │   ├── aal-2mm
│   │   │   ├── sub-0025864_ses-1_bold_aal-2mm.graphml
│   │   │   ├── sub-0025864_ses-2_bold_aal-2mm.graphml
│   │   │   ├── sub-0025865_ses-1_bold_aal-2mm.graphml
│   │   │   └── sub-0025865_ses-2_bold_aal-2mm.graphml
│   │   └── desikan-2mm
│   │       ├── sub-0025864_ses-1_bold_desikan-2mm.graphml
│   │       ├── sub-0025864_ses-2_bold_desikan-2mm.graphml
│   │       ├── sub-0025865_ses-1_bold_desikan-2mm.graphml
│   │       └── sub-0025865_ses-2_bold_desikan-2mm.graphml
│   └── timeseries
│       └── roi
│           ├── aal-2mm
│           │   ├── sub-0025864_ses-1_bold_aal-2mm.rds
│           │   ├── sub-0025864_ses-2_bold_aal-2mm.rds
│           │   ├── sub-0025865_ses-1_bold_aal-2mm.rds
│           │   └── sub-0025865_ses-2_bold_aal-2mm.rds
│           └── desikan-2mm
│               ├── sub-0025864_ses-1_bold_desikan-2mm.rds
│               ├── sub-0025864_ses-2_bold_desikan-2mm.rds
│               ├── sub-0025865_ses-1_bold_desikan-2mm.rds
│               └── sub-0025865_ses-2_bold_desikan-2mm.rds
└── HNU1
    ├── connectomes
    │   ├── aal-2mm
    │   │   ├── sub-0025438_ses-1_bold_aal-2mm.graphml
    │   │   ├── sub-0025438_ses-2_bold_aal-2mm.graphml
    │   │   ├── sub-0025439_ses-1_bold_aal-2mm.graphml
    │   │   └── sub-0025439_ses-2_bold_aal-2mm.graphml
    │   └── desikan-2mm
    │       ├── sub-0025438_ses-1_bold_desikan-2mm.graphml
    │       ├── sub-0025438_ses-2_bold_desikan-2mm.graphml
    │       ├── sub-0025439_ses-1_bold_desikan-2mm.graphml
    │       └── sub-0025439_ses-2_bold_desikan-2mm.graphml
    └── timeseries
        └── roi
            ├── aal-2mm
            │   ├── sub-0025438_ses-1_bold_aal-2mm.rds
            │   ├── sub-0025438_ses-2_bold_aal-2mm.rds
            │   ├── sub-0025439_ses-1_bold_aal-2mm.rds
            │   └── sub-0025439_ses-2_bold_aal-2mm.rds
            └── desikan-2mm
                ├── sub-0025438_ses-1_bold_desikan-2mm.rds
                ├── sub-0025438_ses-2_bold_desikan-2mm.rds
                ├── sub-0025439_ses-1_bold_desikan-2mm.rds
                └── sub-0025439_ses-2_bold_desikan-2mm.rds

That is, we have our directories structured such that all of our timeseries/graphs for a single dataset, single atlas are in a single folder. This is not a "hard requirement" for using the fmriutils package, but it will make our life much easier down the line, and will allow us to easily aggregate dataset and atlas-level identification information associated with our data. We can load our data as follows for a single dataset, single atlas:

require(fmriutils)
require(testthat)
# the path to our graphs or timeseries
sig_inpath <- '~/data/BNU1/timeseries/roi/desikan-2mm/'
gra_inpath <- '~/data/BNU1/connectomes/desikan-2mm/'

# open up the rds timeseries files
signalobj <- fmriu.io.open_timeseries(sig_inpath, dataset_id = 'BNU1', atlas_id = 'desikan-2mm')

graphobj <- fmriu.io.open_graphs(gra_inpath, dataset_id = 'BNU1', atlas_id = 'desikan-2mm')

Aggregating

If we have multiple datasets, and multiple atlases, we can easily aggregate using this structuring of inputs. For example, in the example above, we may want to aggregate over the BNU1 and DC1 datasets, and aggregate over the desikan-2mm and aal-2mm atlases. Note that for this functionality to work, your folder hieararchy should match the specification above:

datasets <- c('BNU1', 'HNU1')
atlases <- c('desikan-2mm', 'aal-2mm')

signalobj <- fmriu.io.collection.open_timeseries('~/data/', datasets=datasets, atlases=atlases)
timeseries <- signalobj$ts

graphobj <- fmriu.io.collection.open_graphs('~/data/', datasets=datasets, atlases=atlases)
graphs <- graphobj$graphs

and our resulting timeseries list, and vectors for subject/session/task/dataset level parameters, will be organized appropriately. NOTE: when attempting to aggregate outputs as was just described, make sure your filenames are descriptive enough such that no two files that you add have the same filename character for character. For example, in the examples above, we have multiple versions of subject 0025864 session 1 timeseries data (1 for desikan-2mm named sub-0025864_ses-1_desikan-2mm.rds and 1 for aal-2mm sub-0025864_ses-1_aal-2mm.rds), but since each timeseries is uniquely identified with the atlas name in the filename, we will not have any overlap.

Processing Data

Temporal Domain

Moreover, as a user it is often valuable to be able to process your timeseries data into other usable formats. For example, let's assume we have loaded in the timeseries data above into timeseries, and we wish to convert each timeseries to a correlation matrix. we can do this easily using our temporal functions:

timeseries <- signalobj$ts
correlations <- fmriu.time.obs2corr(timeseries, include_diag=FALSE)  # converts timeseries to correlation matrices

Frequency Domain

On the other hand, let's say we want to obtain the frequency-domain components for the power or amplitude spectrum. We provide options for also adding bandpassing as desired, but these are not necessary to input. Note that in order to apply bandpassing, you must pass a TR (the repetition time of a single slice of the 4d-data) and one of a lower cutoff or a upper cutoff. Additionally, we provide the ability to normalize the obtained frequency spectra such that per ROI, the bin intensities sum to one. This allows us to later use the frequency spectra as a pdf, such as computing the pairwise KL-divergences between spectra for each pair of ROIs. We can make this call very simply:

# convert each observation to the power spectrum, and highpass filter above .01 Hz
# also normalizes so each spectrum sums to 1
freq <- fmriu.freq.obs2freq(timeseries, tr=2.5, lc = 0.01, hc=NaN, normalize=TRUE, spectrum='pow')

Next, we might want to compute the pairwise divergence, noting that we must have called fmriu.freq.obs2freq() with the normalize option set to TRUE since the elements of our vector in freq must be probability distributions:

div <- fmriu.freq.freq2div(freq)

These two commands can be combined with the below function:

div <- fmriu.freq.obs2div(timeseries, tr=2.5, lc=0.01, hc=NaN, spectrum='pow')

Visualizing Data

Another important basic functionality we may want for our fMRI data is the ability to visualize our graphs or timeseries. To allow this, we provide several useful utilities for visualization of your data.

Dataseries

Often, we will want to plot a dataseries, such as a timeseries or the frequency spectrum, as a series of lines per ROI given some observation we are considering.

Timeseries

To plot a timeseries, we might do something like the following:

ts <- timeseries[[1]]  # the first timeseries
fmriu.plot.plot_dataseries(ts, title="Timeseries Plot", xlabel = "TR", ylabel="intensity", legend.title = "ROI")

Power Spectrum

Unlike a timeseries, for the power spectrum our $x$-axis no longer represents TRs. Instead, our $x$-axis here is the frequency associated with each bin in our FFT. Using the utility fmriu.freq.freq_seq, we are able to take our sampling rate ($\frac{1}{\pi}$ where $\pi$ is the duration of each TR, or the period of a single volume in the scanner) and the number of timesteps and directly compute the frequency associated with each bin, and we can pass that as an optional argument to our plot function:

fs <- freq[[1]]
fmriu.plot.plot_dataseries(fs, xax=fmriu.freq.freq_seq(fs=1/2.5, nt=dim(timeseries[[1]])[1]),
                           title="Power Plot", xlabel = "frequency (Hz)", ylabel="power", legend.title = "ROI")

and similarly for the amplitude spectrum, if we had instead specified 'amp' when creating the freq object.

Graphs

Plotting graphs is similarly easy:

gr <- graphs[[1]]
fmriu.plot.plot_graph(gr, title="Graph Plot",xlabel="ROI", ylabel="ROI", legend.name = "corr")

we may may optionally want to leave in the diagonal:

fmriu.plot.plot_graph(gr, title="Graph Plot, Diagonal Included",xlabel="ROI", ylabel="ROI", legend.name = "corr", include_diag=TRUE)

Manipulating Data

Thresholding and Binarizing Graphs

fMRI graphs are often real-valued; that is, edge weights will natively take some floating-point value unless specified otherwise. Often times, researchers will want simple graphs, where edges can only be either connected (value of 1) or disconnected (value of 0). Further yet, researchers may want to ignore edges with low values or values below a particular cutoff threshold. We start by plotting the first raw graph in our list, and then will apply successive transformations to the raw graphs and show the resulting impact.

fmriu.plot.plot_graph(graphs[[1]], title="Raw Graph", xlabel="ROI",
                      ylabel="ROI", legend.name = "corr")

Cutoff

The most basic way to threshold data is to specify a cutoff threshold. Here, we very simply iterate over each graph in our graphs list, and set elements below our cutoff threshold to zero. We can take a look at this as follows:

cgraphs <- fmriu.thresh_graphs(graphs, method='cutoff', t=0.5)
fmriu.plot.plot_graph(cgraphs[[1]], title="Values below 0.5 cut off", legend.name = "corr")
# test that we end up with no values between 0 and our cutoff since they should be set to zero
expect_equal(sum(graphs[[1]] < 0.5 && graphs[[1]] > 0), 0)

Notice that all the resulting values are greater than 0.5, which was the cutoff threshold we specified.

Percentile

Another simple way to threshold our graphs is to remove values below a given threshold. Here, we iterate over each graph in our graphs list, compute the appropriate percentile for each graph, and then threshold below this cutoff for each graph. Note that the cutoff will NOT be the same for each graph due to the percentile depending on the edge-weight distribution for the particular graph:

pgraphs <- fmriu.thresh_graphs(graphs, method='ptile', t=0.5)
fmriu.plot.plot_graph(pgraphs[[1]], title="Values below 50th Ptile cutoff", legend.name = "corr")
# test that we retain half of the number of values since we threshold by the 50th percentile
expect_equal(sum(pgraphs[[1]] > 0), 4900/2)

Binarizing

bgraphs <- fmriu.thresh_graphs(graphs, method='ptile', t=0.5, binarize=TRUE)
fmriu.plot.plot_graph(bgraphs[[1]], title="Binarized, Ptile-Thresholded Graphs", legend.name="edge")
expect_equal(sum(bgraphs[[1]]), 4900/2)

Z-Scoring

It is often valuable to z-score data; that is, per ROI, to assign a z-score based on the intensity at a given timestep.

We can convert a timeseries/frequency spectra to the z-scored values:

z_ts <- fmriu.obs2zsc(timeseries)
fmriu.plot.plot_dataseries(z_ts[[1]], title="Z-Scored Timeseries Plot", xlabel = "TR", ylabel="z-sc", legend.title = "ROI")

which will default to z-scoring the timeseries per ROI. Alternatively, we may want to globally z-score, such as in the case of a graph. We can call this function similarly, noting that we should adjust the limits since we are no longer dealing with correlation matrices as expected):

z_gra <- fmriu.obs2zsc(graphs, scale='global')
fmriu.plot.plot_graph(z_gra[[1]], title="Z-Scored Graph Plot", legend.name = "z-sc", limits=c(-2, 2))

Converting between Arrays and Lists

Many people choose different representations of their data. The most easy to use is to collect the data using a list-wise representation (where each list element is a single subject's data). Ie:

timeseries   # [[nsubs]] list elements, where each element is an array with dimensions [nt, nroi] for n timesteps and nroi ROIs
graphs       # [[nsubs]] list elements, where each element is an array with dimensions [nroi, nroi] for nroi ROIs

This data format has several options. First, we can easily manipulate lists using the _apply family of functions, which provide C-bindings for things like looping, axis-wise operations, etc, and are a vast improvement speed wise over alternative methods. Moreover, often times in a dataset we will not necessarily have equivalently sized objects, but will still want to perform summary statistics. For example, consider the aggregation above, where we loaded timeseries data from a variety of datasets and parcellation atlases. Each dataset might have varying numbers of timesteps, and each parcellation atlas may have a different number of ROIs. This format would not be possible to enforce with array-wise manipulations.

As this data format is quite unique to R, we also allow users easy conversions to more matlab-esque representations that may be better tailored to their algorithms. We also provide array-wise representations:

ts_array  # [nt, nroi, nsubs]-dimensional array for nt timesteps, nroi ROIs, and nsubs subjects
gra_array  # [nroi, nroi, nsubs]- dimensional array for nroi ROIs and nsubs subjects

Note that many functions will expect ONLY the list-wise representations internally. However, you can convert between the two formats easily using the simple utilities below:

gr_subset <- graphs[graphobj$atlas == "aal-2mm"]
grarray <- fmriu.list2array(gr_subset)
grlist <- fmriu.array2list(grarray)

Use this conversion with caution. The list representation elements are stored with relevant metadata (the original filenames), so some of this data may be lost transferring back and forth between lists and arrays. Moreover, arrays require every element to have the same shape, whereas lists relax this assumption, meaning we cannot store data from parcellations with different numbers of ROIs, timeseries from subjects with varying numbers of timesteps, etc. For these reasons, we recommend only using arrays when absolutely necessary.



neurodata/fmriutils documentation built on May 14, 2019, 2:30 p.m.