Final Project: A R Package For Electrophysiology Data Analysis
Introduction (Background/Motivation)
This package is specifically designed to analyze the electrophysiology data from an experiment in my lab. Basically, the experiment is about investigating how chronic alcohol drinking alters the animals' brain circuits, which plays an important role in addiction. We need to compare the cells' voltage threshold and current threshold from alcohol drinking animals to water control animals. The lower the threshold, the more excitatory connections between the circuits.
Without making this package, there are a lot of works need to be done to analyze the data. First, manually exclude the cells that are not stable based on several criteria. Second, use excel to reorganize the sheets and calculate mean, STD, n and some other basic properties for each variable. Third, import all the data to Prism, reorganize them, perform statistical analysis, make graphs, and make them pretty. We need to do all the above steps over and over again every time we get new data, which is a lot of repetitive work. Under this circumstances, I created this package to make things a lot easier. It contains several functions that are specifically designed to do all of the above steps required. This can quickly do data analysis and graph plotting, which would save a lot of time. The following will be the explanation of each function and some examples on how to use them.
knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
library(FinalProjectYin) library(dplyr) library(ggplot2) library(ggpubr) library(shiny) library(miniUI) library(kableExtra) library(flextable) library(knitr)
Package Content & Example Demonstrations
Read the data set for example demonstration.
I am going to use one of the electrophysiology data sets from my lab.
ephys <- read.csv("Ephys_Yin.csv", header = TRUE)
Function 1: ephysfilter: Filter data
This function helps with the first step of data analysis, which is filtering data (This function requires dplyr package). There are some criteria regarding the cell quality:
1. The y frac value should fall between 0.4-0.6.
2. The z plane value should fall between 35-75.
3. Resistance difference should be less than 20%, which is less than 0.2 from the table.
The following is an example: By simply putting the original file name into this \func{ephysfilter()}, we can easily filter out those unstable cells that do not fit those criteria and put the filtered data into a new file called ephys1.
head(ephys1 <- ephysfilter(ephys))
Function 2: Thresholdplot: Create a violin graph
The second step is to plot a graph to visualize the data. I decided to use violin graph with a box plot in it because violin plot can not only show a marker for the median of the data, a box indicating the interquartile range, but also provide additional information about the probability density of the data at different values. Though we need to install the ggpubr package and use ggviolin in order to use this function, it still saved a lot of time by not having to type in all the labeling, captions, and requirements.
The following is an example: But putting the dataname, treatment group, dependent variable name, and the y-axis title into the \func{Thresholdplot()}, it will generate a violin plot: The x-axis is the treatment group (alcohol and water). The y-axis is the dependent variable, which is voltage threshold (mV) in this example. The title above is called "Voltage Threshold (mV)".
Thresholdplot(ephys1, "Treatment", "Voltage.threshold.mV", "Voltage Threshold (mV)")
Function 3: StatsTable: Create a Statistical Table
After visualizing the data, we need to calculate the statistics to see if the two groups are actually significantly different from each other. This \function{StatsTable} helps with creating a statistical table which shows the mean and standard deviation of the two treatment groups. A couple of steps were included in this function:
1. Create a data frame with only one treatment group.
2. Select only the cell ID and the dependent variable columns in that data frame.
3. Calculate the mean and standard deviation for the selected dependent variable.
4. Repeat the above steps for the other treatment group.
5. Combine those two data frame into one.
This \function{StatsTable} helps save a lot of time and steps to get to this statistical table. The following are two examples of how to use this function. By putting the data name, two treatment groups' name, cell ID column name, and the name of the dependent variable, we can easily get the mean and STD for alcohol and water groups' voltage threshold and current threshold.
StatsTable(ephys1, "alcohol", "water", "Cell.ID", "Voltage.threshold.mV") StatsTable(ephys1, "alcohol", "water", "Cell.ID", "Current.Threshold..pA.")
Function 4: normaldisplot: Visualizing Data Normality
Before performing the t-test for the above two treatment groups, we need to determine if the data in each group is normally distributed. To visualize the data normality, we can use Quantile-quantile plots (Q-Q plots), which draws the correlation between a given sample and the normal distribution. If most of the data points are fitted on the line, that means the data set is normally distributed (This function requires ggplot2 package).
The following is an example of alcohol treated group's voltage threshold data distribution. By inputting the file name, treatment group name, the dependent variable name, and the title you wanted, a Q-Q plot will be generated with a linear regression line. We can see that all the data points are close to the line, indicating that all data can be explained by this model. The data set is likely normally distributed based on the graph.
normaldisplot(ephys1, "alcohol", "Voltage.threshold.mV", "Voltage Threshold (mV)")
Function 5: normalization: Check Data Normalization
Besides visualizing the distribution of the data, we have to calculate the statistical results to prove if the data set is actually normally distributed. This \function{normalization()} helps with two types of preliminary tests to check the unpaired two-sample t-test assumptions.
The first one is to use Shapiro-Wilk normality test to look at the normality. If the p-value is larger than 0.05, it means the distribution of the data are not significantly different from normal distribution. So we can assume the normality.
The second one is to use F-test to test for homogeneity in variance by using var.test(). If the p-value is greater than 0.05, it means there's no significant difference between the variance of the two sets of data. So we can use the classic t-test for the analysis (This function requires dplyr package).
Below is an example of the preliminary tests of the voltage threshold data in alcohol treated group. After providing the data name, dependent variable, and the treatment group in the function, we can quickly get the results of the two tests: (1) Shapiro-Wilk normality test p-value=0.1962, which is larger than 0.05. (2) F-test p-value=0.6951, which is also larger than 0.05. These indicated that alcohol group data has the normality and the homogeneity in variance. We can move on to use the classic t-test for analysis.
normalization(ephys1, "Voltage.threshold.mV", "alcohol")
Function 6: threshttest: Calculate T-Test p-value
The final step is to perform a t-test to study if there is a significant difference between the two treatment groups. By simply typing in the dependent variable of interest and the file name, we can get the p value of the t-test and know the relationship between these two groups.
The following are two examples of using this function. The results showed that the p-value for voltage threshold is 0.01275359, which is smaller than 0.05. This indicates that there is a significant difference of voltage threshold between alcohol and water groups. However, the p-value for current threshold is 1, which is larger than 0.05. It means there's no difference in current threshold between these two groups.
threshttest("Voltage.threshold.mV", ephys1) threshttest("Current.Threshold..pA.", ephys1)
Function 7: Shinygraph: Drag and Select Data Points
The last function in this package is to use the shiny gadget to drag and select data points. Sometimes when we have all the data points on a graph and want to see the detail information of certain data points, we have to look back and forth between the graph and the raw data file. With this \function{Shinygraph()}, we can simply drag a box that include the data points we want to see and then click "done". The detail information of the selected data will be shown.
The following is an example of selecting the water and alcohol treated voltage threshold data. By typing in the data file name, treatment, and the y-axis dependent variable, we can see a graph shown on the right side viewer. Let's say we want to see what are the two cells in alcohol group at the very bottom. We can drag a box around these two points, and then click "Done". The inforamtion is shown below.
shinygraph(ephys1, "Treatment", "Voltage.threshold.mV")
Future Work and Plans
In terms of future plans, there are several possibilities:
1. Our lab may gather electrophysiology data from different brain regions in the future, which may need similar analysis like graphing and t-test, but will have different criteria. We can still use the \func{ephysfilter()} to filter the data, but will need to change some of the default settings in the function. For example, now the y frac value is set to 0.4-0.6 because it is recording from layer five of the cortex. If we need to record from layer II or III, the y frac criteria might change to 0.3-0.4. We can type in y_max=0.4, y_min=0.3 to the function to change the default settings.
2. We might gather some step current data in the future. This requires line graph for each current instead of violin plot. In order to do that, I need to:
(a) Select only the cell ID and step current columns and create a new data frame for one treatment group.
(b) Select the range for x and y axis. Sample coding would be like:
(c) Set up the plot. Sample coding would be like:
(d) Add lines from different treatment groups together in one graph.
(e) Add title, subtitle, and a legend.
(f) Make it into a function and add this function into this package.
Overall, this package helps make the data analysis of electrophysiology data in our lab a lot easier and faster. As long as we keep organizing our raw data in a similar pattern, by using the above seven functions, we can quickly filter the data, draw beautiful graphs, and get valuable statistical results. If our lab starts collecting new variables or doing new experiments, we can always modify the current functions by changing some default arguments, or write new but similar functions for new data analysis.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.