In KevinCrab/Instant-Gas-Exchange-Measurements: Instant Gas Exchange Analysis

Introduction

Photosythesis rate, water conductance and Ci/Ca (intercellular to ambient CO2 partial pressures) are three key plant physiological parameters in terms of carbon simulation and water use efficiency ( Lawson and Blatt, 2014; Ellsworth and Cousins, 2016 ). In short, I will refer these three parameters as "gas exchange parameters"" in the following context. They can be measured simultaneously using LI-6400XT Portable Photosynthesis System (LiCor Inc. Lincoln Nebraska, USA). The actual measurement procedure is relatively complex, but the short story is that you place the machine right next to the plant, clamp the leaf into the chamber, and initiate the autolog program. LI-6400XT records all the input parameters in time series and does the calculation to get those parameters that we care about.

This R package is designed specifically for experimental projects that are currently going on in Leakey Lab (UIUC). Recently Postoc Dr. Wertin applied a high-throughtput protocol for gas exchange measurements in this summer field season. The major improvement of this protocol involves harvesting leaves in the field and then conducting measurements in lab instead of taking the machine out in the field to the plants. This newer method allows us to measure more than 200 leafs (single 4-minute measurement on each leaf) per day, attributing to the fact that people don't need to carry around the heavy machine in the field under harsh sunlight. After being harvested from field and take back to lab, each leaf is kept hydrated by inserting its cut wound side into a 10 ml tube full of water. Leaves are rotationally incubated for half an hour in two lighted growth chambers prior to measurements since it takes that long to activate each leaf and let them function happily. Recording leaf's identity into LI-6400XT is done by adding "remarks" of leaf identity between two measurements. This is a bit old fashioned and also is where a big part of human errors come from that need to be fixed. Measurements are taken place after clamping the leaf for 4 minutes.

Example data provided in this tutorial is my real data collected in this summer. In order to better show the functions of my package as well as to have a reasonable data size, only a part of my dataset is provided here. My experimental design is a RCBD with 2 blocks. Within each block, each genotype has its plot and there are 2 biological sampled within each plot. Due to the fact that here is a partial dataset, total reps for most genotypes are less than 4.

source("~/GitHub/Instant-Gas-Exchange-Measurements/R/GXread.R")
source("~/GitHub/Instant-Gas-Exchange-Measurements/R/CS_GXcurve.R")
source("~/GitHub/Instant-Gas-Exchange-Measurements/R/GetValue_GXcurve.R")
source("~/GitHub/Instant-Gas-Exchange-Measurements/R/FindGX.R")
source("~/GitHub/Instant-Gas-Exchange-Measurements/R/plotGX.R")
source("~/GitHub/Instant-Gas-Exchange-Measurements/R/PGMean.R")
source("~/GitHub/Instant-Gas-Exchange-Measurements/R/summary_GXcurve.R")
source("~/GitHub/Instant-Gas-Exchange-Measurements/R/summary_GXvalue.R")
source("~/GitHub/Instant-Gas-Exchange-Measurements/R/summary_GXmean.R")

The aim of this package is to provide a pipeline of treating the standard output table from LI-6400XT. Measurements are first reformated, merged and screened to see abnormal measurements in read.GX and CS function. Then curves can be plotted so that outliers can be spotted and found, using plotGX and FindGX. Values will be extracted from curves with function GetValue. At last PGMean is used to calculate plot means and genotype means. summary provide detailed information for corresponding class of data.

read.GX

(Read in Gas Exchange raw data)

Once you copied the raw output table from LI-6400, turn it into csv file. My files have a naming convention of "Operator + date + population + machine .csv". These data are provided in the same folder. These data has a long header, lots of parameters, ton of useless time remarks and relatively complicated structure. In short, they desperately need to be re-structured and re-organized.

First we read them in. On each day we sample a whole set of replicates, thus, samples measured on 7-18 and 9-20 are leaf replicates 1 and samples measured on 7-19 and 7-21 are leaf replicates 2. genolist is a table listing all leaf identity remark names with corresponding genotype names, ranges and blockes, so that read.GX can match and add genotype names for each measurement. After reading in, the output data has class of GXcurve.

If argument condition is TRUE, the measurement conditions will be checked and printed. Some machines doesn't record all the conditions (like this case below). Warnings will pop out when the file has multiple conditions or incomplete conditions. Check or not, GXcurve file will include as many conditions as it recorded in the attributes.

leak1_718 = read.GX(filename = "KX_07-18-2017_ril_leak1_.csv",leaf_rep = 1,genolist = "genotypic_information.csv", condition = TRUE)

Notice in this dataset some name "remarks" have only 2 digits (operator's fault), which will cause trouble later. So all plot names have to unified into 3 digits.

steward_718 = read.GX(filename = "TMW_07-18-2017_ril_stewart_.csv",leaf_rep = 1,genolist = "genotypic_information.csv")

I omit the code of reading in the remaining csv files here. In total I read in 9 example dataset here: leak1_718, leak2_718, steward_718, leak1_719, leak1_720, steward_720, leak1_721, leak2_721, steward_721

leak2_718 = read.GX(filename = "KX_07-18-2017_RIL_leak2_.csv",leaf_rep = 1,genolist = "genotypic_information.csv")
leak1_719 = read.GX(filename = "KX_07-19-2017_RIL_leak1.csv",leaf_rep = 2,genolist = "genotypic_information.csv")
leak1_720 = read.GX(filename = "nicole-7-20-2017-ril_leak1.csv",leaf_rep = 1,genolist = "genotypic_information.csv")
steward_720 = read.GX(filename = "KX_07-20-2017_RIL_Stewart.csv",leaf_rep = 1,genolist = "genotypic_information.csv")
leak1_721 = read.GX(filename = "cm_07-21-2017_ril_leak1_.csv",leaf_rep = 2,genolist = "genotypic_information.csv")
leak2_721 = read.GX(filename = "cm_07-21-2017_ril_leak2_.csv",leaf_rep = 2,genolist = "genotypic_information.csv")
steward_721 = read.GX(filename = "KX_07-21-2017_RIL_STEWARD_.csv",leaf_rep = 2,genolist = "genotypic_information.csv")

Let's take a look of the output.

dim(leak1_718)
head(leak1_718)
str(leak1_718)

You can use summary to get some knowledge for GXcurve class objects such as meausurement counts, leaf rep number. Also you can take a look of parameter summary if value_summary is set to be TRUE, as well as experimental conditions if you pass a character condition name to argument check_condition. (choose from: measurement_time / machine / lightsource / AD.avgtime / flow / par / CO2_mixer / Tblock)

summary(leak1_718,value_summary = TRUE, check_condition = "flow")

I want to add that, in short, in IGEA we call "Photosythesis" as "Photo", "Water Conductance" as "Cond", "intercellular to ambient CO2 partial pressures" as "Ci.Ca"

CS

(Combine and Screen)

Combining means that two seperate GXcurve tables are merged based on rows due to the fact data structures should be the same after being read in. Also measurement condition attributes are deleted.

Since dataset usually has a huge size, it is nearly impossible to go through each measurement and screen for abnormal measurements caused by human errors. Generally there are three kinds of human errors: 1. Duplicate names. Ideally the name recording for leaves are unique. But it is possible that during operation people accidentally put in a wrong name, resulting to a name duplication for another leaf. 2. Miss name recording. As mentioned before names are recorded by inserting "remarks" in between two measurements. Sometimes people forgot to add remark, resulting to a too long meausrement for the first sample. 3. Recording Interruption. When recording, errors might occur or the position of leaf needs to be adjusted, thus leading to too few recording in that measurement.

Now we apply CS on my 9 seperate datasets. It will print out the measurements in each dataset and in total. In addition, if you have any one of the potential human errors mentioned above, you will know how many errors there are and the sample names for each error. One thing that I think is handy is that these human error information will be a part of attributes of the GXcurve table and you can easily extract these information and target their location in the next function FindGX.

AllCurve = CS(leak1_718,leak2_718,leak1_719,leak1_720,leak1_721,leak2_721,steward_720,steward_721,steward_718)
dim(AllCurve)

FindGX

(Find Measurement Index in Dataset)

FindGX is designed to locate the abnormal measurements or the measurement you are interested in. First argument you give it is the dataset that has the measurement you care about. Each raw dataset that was merged earlier has a tag indicating the raw dataset name, so it allows the function to trace back to the location in the original dataset. After that you can put in character name of the measurement.

FindGX(AllCurve, id = "119_1")

Input can also be a character vector.

FindGX(AllCurve, c("119_1","110_1"))

Since name list of the questionable measurements are stored in the GXcurve table attributes, we can easily pull it out by giving TRUE to arguments DuplicateName, MissRecording and TooSmall (corresponding to three types of human error mentioned above). After getting the location and taking a look of them, we can make a judgement call how to deal with them (ignore them? delete them? etc.)

FindGX(AllCurve, DuplicateName = TRUE, MissRecording = TRUE, TooSmall = TRUE)

plotGX

(Plot Curve against Time)

After taking care of technical errors, we might want to know how our curves look like and identify outliers based on that. The arguments of plotGX might seem confusing, but once you learn about it, you will see it has a big flexibiliy of how you want to give it the inputs. By the way, this function is relied on ggplot2.

First argument is trait, which is straightforward. Choose one from "Photo", "Cond" or "Ci.Ca". One type of trait can be plotted at once.

Measurements of the same genotype will be plotted together, so that the outliers can stand out. PGN format Plots are saved into the working directory with a naming convention of "trait + genotype name.pgn". A sentence will pop out when each genotypic categorized measurement graph is drawn.

There are three combination of choices in terms of telling FindGX which measurements to plot and they will be useful in different circumstances.

Give a GXcurve class object to argument table. In this case every single measurement in this table will be ploted. This way will be helpful if you have a full dataset and want to plot out all the graphs at once. I don't want this whole page to be covered with ">>> curve saved" reminders so I test this function in a artificial dataset instead of AllCurve.

library(ggplot2)

PlotExample = AllCurve[AllCurve$genotype %in% head(unique(AllCurve$genotype),3),]
plotGX(trait = "Photo", table = PlotExample )

Type in character name of a genotype to genotype and give dataset the table that contains the chosen genotype. Now measurements from only one genotype is plotted. That genotype must be in the dataset provided.

plotGX(trait = "Cond", genotype = "Z019E0032", dataset = AllCurve )

Here is an example of how the output graph looks likt. "208_2" has obviously abnormal status so it is considered a potential outlier. Here genotype "Z019E0032" only has 3 reps due to the reason it is a partial dataset, so we need more evidence before we can decide which measurements are outliers.

trait = "Cond"
sub = subset(AllCurve, genotype ==  "Z019E0032")
sub = sub[order(as.numeric(sub$plot)),]
ggplot(sub, aes_string("FTime", trait)) +geom_point(aes(colour=name), size =5) + labs(title=paste("Z019E0032",trait))+ 
  theme_bw()+ 
  theme(plot.title = element_text(size=30, face="bold"),
        legend.title=element_blank(),
        legend.text = element_text(size = 30),
        legend.key.size = unit(2,"line"),
        legend.position="bottom")

If you just want to look at a single measurement, you can type in character of the sample leaf to id and the dataset that contains it to dataset. id must be in the dataset provided.

plotGX(trait = "Ci.Ca", id = "120_1" , dataset = AllCurve )

These three kinds of inputs have different scales, from biggest to smallest. Plotting can take some amount of time, take memories and could generate a longs list of PNG files, so it will be nice we can only plot the graphs that we want to look at.

GetValue

(Get first_minut and last_minute Values)

There are two types of values we calculate and include in our further experiment (such as correlations with other traits etc.): First, we find the biggest "water conductance" value in the first minute and take the corresponding "Photosythesis" and "Ci/Ca" values; Second, we take the means of these three parameters in the last minute. Currently we don't know which set of values are better representative but we hope further analysis will tell us more. You can choose to calculate both types of values by defalt or just one of them by setting first.minute or last.minute to FALSE. The first argument in the function should taka a GXcurve class table.

Each row in the output represents a single 4-minute measurement. This function returns a table of GXvalue class.

AllValue = GetValue(AllCurve)

head(AllValue)
str(AllValue)

Apply summary function on GXvalue object could tell you what's the total entry number, how balanced our dataset look like (based on leaf rep numbers) and a peek of selected trait.

summary(AllValue, "Photo_first")

PGMean

(Plot or Genotype Mean)

Once we got the values for each leaf sample, we can average the replicates in plots to get plot mean. Furthermore, since my experiment is RCBD with two blocks, I will need other softwares or packages to test for "block" effect and get BLUPs, but PGMean could still calculate genotype mean as a quick look.

It takes a GXvalue class table and you have to choose do you want "plot" mean or "genotype" mean. Standard errors are also calculated along with means. Due to the fact this example dataset is not complete, some plots only have 1 rep, so standard errors are left NA. In the end a GXmean class table is our output.

All_Plot_Mean = PGMean.GXvalue(table = AllValue, type = "plot")
head(All_Plot_Mean)
All_Geno_Mean = PGMean.GXvalue(table = AllValue, type = "genotype")
head(All_Geno_Mean)

Apply summary function on GXmean object could give you a rough idea how our values look like, as well as how complete your dataset is(based on rep numbers). You probably noticed there are two genotypes that have way more reps than others, that's my parents as checks in the field.