knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) library(GCalcium)
Calcium indicator methods such as GCaMP produce massive amounts of data; in some cases producing hundreds-of-thousands of data points for a single subject. Further, there is currently no ubiquitous way to organize or analyze this type of data. To even analyze the data, the researcher must:
Organize the data into a format that is easy to manipulate
Extract useful data points for visualization and analysis
Visualize the data
The GCalcium package gets researchers to the analysis phase more quickly by:
Providing simple, quick commands to format data for use both with, and without the GCalcium package
Extracting data points and waveform characteristics for summarizing and comparing activity within and between trials
Shortcuts for useful visualizations
This document explains how to use GCalcium's functions to format, extract, and manipulate calcium indicator data.
The data included with the GCalcium provides a sample of a time series-like dataset exported from Matlab using the TDTFilter command with a modified version of Dr. David Root's Matlab script. This data was collected using GCaMP6.
This dataset consists of 11 rows and 814 columns. 10 trials from a pilot study were used, with calcium activity from 4 seconds before and after stimulus onset (0s).
In order to use the rest of the package, data must be in a data frame format, with the requirements being:
The first column is a measure of time
The following columns are recorded values from trials in ascending order that correspond to the times the values were recorded at, with 1 column per trial
Fortunately, the GCalcium package includes functions that quickly reformat the data for ease with both user manipulation and use of this package. All formatting commands output this type of data frame.
Currently, the only command for formatting data is format_data
, which takes a matrix of a time measurement in the first column or row, and one trial per column or row in the subsequent columns/rows. A data frame with the first row "Time" and subsequent rows "Trial#" is outputted.
GCaMP.form <- format_data(GCaMP) ### Old table GCaMP[1:5, 1:5] ### New table GCaMP.form[1:5, 1:5]
Note: the data frame used with the GCalcium package does not have to be labeled the same as the format_data frame. This is simply for ease of calling each trial using outside functions.
To perform analyses or explore differences in activity waveforms, one must filter and summarize the data. Knowing what wave characteristics to compare can be confusing; as many scientists do not typically work with this type of data. The following commands extract and/or summarize numbers comparisons that have been used in past research. These functions are split into 2 types: vector inputs, and matrix (format_data
style) inputs.
find_peaks
looks for peaks based on inflection points and a simple filter
inflect_points
finds the peaks and valleys of waves
find_peaks
finds peaks or valleys in waveforms by using inflection points, with filter of 'n' increasing/decreasing points on both sides of each inflection point. A positive numerical input for 'n.points' returns the indices of peaks for the input vector, while a negative value returns indices of valleys.
Let's say we wanted to find all peaks of trial 1 that have 10 consecutive decreasing points on the left and right of each peak, and use these indices to subset the data.
peak.indices <- find_peaks(GCaMP.form$Trial1, n.points = 10) ## Subset using indexing peaks.df <- data.frame( times = GCaMP.form$Time[peak.indices], vals = GCaMP.form$Trial1[peak.indices] ) head(peaks.df, 4)
inflect_points
uses derivatives to find and label the inflection points (peaks and valleys) of a vector, along with the points between them.
inf.pts <- inflect_points(GCaMP.form$Trial1) print(inf.pts[1:46])
The value -2 indicates a peak, 2 indicates a valley, and 0 indicates a point on the curve between -2 and 2, or vice versa.
averaged_trials
Averages values across trials
avg_curve_slope
gets the average slope of a curve (half wave)
between_trial_change
finds the difference in mean activity between trials
centered_AUC
finds the area under each curve
consecutive_trial_change
finds the difference in mean activity between trial n and n+1
moving_window
summarizes data within windows of time
inflect_points_df
finds provides a summary of the data between each inflection point (half wave)
perc_baseline
calculates the percent change from a user-specified baseline period
within_trial_change
finds the difference in mean activity within a trial
averaged_trials
averages values over each time point, across the specified trials. This is especially useful when blocking groups of trials.
Let's say we want to plot the averaged values of trials 1-5
df.1thru5 <- averaged_trials(GCaMP.form, 1:5) plot(x = df.1thru5$Time, df.1thru5$Values, type = 'l', xlab = 'Time (s)', ylab = 'Values')
avg_curve_slope
takes advantage of the 'lm' function to get the average slope of a curve
Let's say we wanted to find the average curve slopes of the waves in trial 2:
avg.slopes <- avg_curve_slope(Dataframe = GCaMP.form, Trial = 2) print(avg.slopes[1:25])
between_trial_change
finds the difference in means during the same time range between sets of trials
For example: we want to see how neural activity during the trial changes after manipulating the experimental variable. The control trials are 1-5, and the experimental trials are 6-10.
between_trial_change(Dataframe = GCaMP.form, TrialRange1 = c(1, 5), TrialRange2 = c(6, 10), Time.period = c(0, 4))
centered_AUC
centers input values using a user-specified function such as the mean, then uses trapezoidal integration from the 'pracma' package to find the area under each curve.
Let's say we wanted a metric besides the mean to measure neural activity to compare trials 2 and 4.
### Trial 2 centered_AUC(Dataframe = GCaMP.form, Trial = 2, FUN = mean)[1:10,] ### Trial 4 centered_AUC(Dataframe = GCaMP.form, Trial = 4, FUN = mean)[1:10,]
consecutive_trial_change
finds the difference in means between consecutive trials during the same time range.
For example: we want to know how much the change in activity is along trials 1-10.
consecutive_trial_change(Dataframe = GCaMP.form, Trials = c(1, 10), Time.period = c(0, 4))
inflect_points_df
uses inflect_points
to find the inflection points, then summarizes the data and returns a data frame with the following variables: Time, raw (input) values, inflection points, and the number of the respective curve.
inf.pts.df <- inflect_points_df(Dataframe = GCaMP.form, Trial = 1) head(inf.pts.df, 6)
In differentiating between inflect_points
and inflect_points_df
, notice that the purpose of this function fully corresponds to its name. The output and first input are both data frames.
moving_window
summarizes data within time windows of a specified length, across a single trial.
Let's say we want to find how the average fluorescence changes within trial 5 in 1 second intervals
moving_window(Dataframe = GCaMP.form, Trial = 5, Window.length = 1, FUN = mean)
perc_baseline
calculates the percent change from a baseline period specified by the user for all trials in the input matrix or data frame. This outputs the same object, but with values transformed to percent change from baseline. This is a good way for standardizing data within trial periods; especially when the baseline period has low standard deviations that cause inflated values in transforming into z-scores.
perc_baseline(Dataframe = GCaMP.form, Baseline.times = c(-3, -1), Baseline.frame = FALSE)[1:3, 1:4]
within_trial_change
finds the change in mean values between the beginning and end of the entered time ranges for a single trial.
For example: we want to know how the mean activity changes between the first two seconds before epoc (baseline) and during the trial.
within_trial_change(Dataframe = GCaMP.form, Trial = 1, Beg.period = c(-2, 0), End.period = c(0, 2))
z_score
transforms input values into z-scores. This also allows for a user-specified mean and standard deviation to compare distributions.
Let's say we wanted to see how the variability of baseline and trial compare by using a mean and standard deviation from a baseline period before epoc.
### Extract values basevals <- GCaMP.form$Trial1[GCaMP.form$Time <= 0] eventvals <- GCaMP.form$Trial1[GCaMP.form$Time > 0] ### Find baseline (pre-epoc) values base.mu <- mean(basevals) base.sigma <- sd(basevals) ### Compute centered z-scores z.scores <- z_score(x = eventvals, mu = base.mu, sigma = base.sigma) print(z.scores[1:30])
Note that the return format is different from the base R 'scale' function, in that it does not create new attributes.
plot_trials
uses the base R graphics to create a quick plot of the trial waves.
For example: we want to visualize the first 2 and last 2 trials
my.trials <- c(1, 2, 9, 10) plot_trials(Dataframe = GCaMP.form, Trials = my.trials)
Note: this function automatically adjusts the x- and y- axes to fit all values. It also creates a legend for the corresponding trials.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.