knitr::opts_chunk$set(collapse = TRUE, comment = "#>")
The fbutils
package was written for simple data management of plant breeding trials, and as a complimentary package to the Field Book Android App. This package includes functions for data reading, visualization, outlier detection, and spatial adjustment. This document provides an example workflow for managing data in the Field Book Table format.
As with any other package, use the following code to load the fbutils
package:
library(fbutils)
library(dplyr)
It may be helpful to first define some terms that will be used throughout this vignette. Here is a table of some common terms and their meaning:
Term | Definition ----------------|------------ line / genotype | an entry in an experiment trial | a single experiment in which many lines are evaluated plot | a single space in which a single line is grown row | a series of plots arranged across columns column | a series of plots arranged across rows
Most of the functions in the fbutils
package begin with "fb_
" to identify them as part of the package. Below is a list of the available functions (subject to change) and their purpose:
Function | Purpose
------------------------|------------
as_field_book_table()
| Convert a data.frame
to a fbt
-formatted data.frame
fb_heatmap()
| Visualize observations spatially using a heatmap
fb_outliers()
| Identify outliers
fb_parse()
| Parse multi-categorical traits
fb_set_missing()
| Set observations as NA according to the unique ID
fb_spatial_adj()
| Perform spatial adjustment of observations using a moving grid
fb_traits()
| List the traits in a fbt
data.frame
fb_visualize()
| Visualize the distribution of traits
read_fb()
| Read in a .csv
and convert to a fbt
-formatted data.frame
fbt
FormatI will refer to the output from the Field Book App as the Field Book Table, or fbt
, format. Within the R environment, you can treat a fbt
object as a data.frame
. The input tables for the Field Book App require three columns: unique ID, row, and column. However, additional information may be included, such as line (or genotype) name, block, rep, etc. Whatever extra columns are in the input file that is used in the Field Book App will also be included in the output file.
For the fbutils
package, four columns are required: unique_id
, row
, column
, and line_name
. The unique_id
variable is a unique identifier for a plot within a trial. The row
variable is an integer identifier of the trial row, and the column
variable is an integer identifier of the trial column. The line_name
variable is a character vector of identifiers for the lines included in the trial.
Note, the
fbt
format is not a defined object class. To make thefbt
-formatted data.frame more amenable to manipulation by function in thedplyr
ortidyr
packages, we could not use the class attribute.
Here is an example of a data.frame
in fbt
format:
data("fbt_sample") head(fbt_sample)
You may notice that some column names are in lowercase and some columns are in title case (i.e. the first letter is capitalized). Columns in lowercase are considered metadata in all fbutils
functions, while columns in title case
are considered traits in all functions. For instance, in this fbt
data, the columns Continuous
, Discrete
, and Multi
are all trait names.
The fbt
format is highly compatible with functions in tidyverse
packages (e.g. dplyr
or tidyr
). For instance, columns can be removed using the dplyr
function select()
:
# Remove all columns between 'tray_row' and 'seed_id' fbt_sample %>% select(-tray_row:-seed_id)
Or columns could be maniputated using the function mutate()
:
# Add 2 to each element in the 'Discrete' trait fbt_sample %>% mutate(Discrete_new = Discrete + 2)
Data may be read-in using the read_fb
function, as such:
read_fb(file = "field_book_table.csv")
By default, read_fb
makes some assumptions about the names of some columns. Refer to the documentation (i.e. ?read_fb
) for more information. This function is a wrapper around the function as_field_book_table
, which can convert any data.frame
into the fbt
format. I will demonstrate this function using the example data.
fbt <- as_field_book_table( fbt_sample, unique.id = "unique_id", other.cols = c("plot", "tray_row", "tray_id", "seed_id", "pedigree") )
Like the fbt
format, the as_field_book_table
function requires four metadata columns. If other metadata columns are present, but you do not specify them using the other.cols
argument, those columns will be considered traits. For example, look what happens when the 'other.cols` argument is not used.
as_field_book_table(fbt_sample, unique.id = "unique_id") %>% head()
Here, metadata columns like plot
and tray_row
are now in title case, and thus will be treated as traits. Not good!
Once data in imported in to the fbt
format, it can be helpful to pull-out the trait names. A simple function, fb_traits
can do this:
fb_traits(fbt = fbt_sample)
By default, fb_traits
identifies all the traits in the fbt
object, however
one can specify numeric
or character
traits with an additional argument.
fb_traits(fbt = fbt_sample, mode = "numeric")
One of the first things to do with raw data is to visualize it. The fb_visualize()
function provides a quick way to glimpse the distribution of data. By default, only numeric traits are plotted.
fb_visualize(fbt_sample)
The default plot type is a boxplot, however a histogram can be plotted, too.
fb_visualize(fbt_sample, graph = "histogram")
A very useful feature of the fbutils
package is the ability to visualize observations spatially via a heatmap. This allows us to observe the spatial trends in the data.
fb_heatmap(fbt_sample)
The Continuous
trait displays the desirable spatial distribution - random. We can also clearly visualize the single outlier in the OneOutlier
trait, as well as the very apparent spatial pattern the SpatialGradient
trait.
This package assumes that the values of the observations for each trait are in the intended units. However, the Field Book App allows for other types of trait, such as multi-categorical. For instance, disease readings may include multiple observations on a single plot. With the Multi-Categorical trait option, these observations can be stored as several values deliminated by a special character.
There are some functions in the package to handle this type of information. Let us look at the Multi
trait in the example fbt
object:
fbt_sample %>% select(Multi) %>% head()
Each of the elements in this column is 10 observations from a plot The fb_parse()
function converts these observations into individual columns, plus calculates the mean and standard deviation. Here we parse those observations and display the first four columns and any columns starting with Multi
.
fb_parse(fbt_sample, traits = "Multi", sep = ":", n.obs = 10) %>% select(1:4, starts_with("Multi")) %>% head()
The fb_parse()
function successfully separates each individual observation into a separate trait.
An important step in data processing is detecting and removing outliers. The fb_outliers()
function performs simple outlier detection based on observations that fall outside of a set distribution. For instance, this function may be used to detect outliers that are more than 3 standard deviations from the mean. Here I demonstrate on a trait that was purposefully created to have a single outlier:
outliers <- fb_outliers(fbt_sample, traits = "OneOutlier", max.sd = 3)
The outlier summary provides information to examine the observation and determine, perhaps subjectively, if the observations should be kept.
Sometimes it is useful to set some obervation to missing (i.e. NA), such as in the case of an outlier or otherwise undesirable observation. The fb_set_missing()
function can set observations to missing given the unique_id
of that observation and the desired trait. Note that unless otherwise specified, this function will set missing all trait observations for a particular unique id.
Here, we remove the outlier that was detected before and then display the first four columns and the column of the trait:
fb_set_missing(fbt_sample, traits = "OneOutlier", unique_ids = "13RPN00010") %>% select(1:4, OneOutlier) %>% head()
Spatial adjustment is another important step in processing field-based data. Often, fields have some level of spatial variability, such as a topographic slope, soil variability, or some other attribute that impacts the observed phenotype in some pattern. These spatial patterns contribute to the non-genetic noise ($V_R$) that makes up the observed phenotypes. By reducing this noise, the proportion of the variation in observed phenotypes that is due to genetic variation increases as per the equation for heritability:
$$H = \frac{V_G}{(V_G + V_R)}$$.
A suitable method of performing spatial adjustment (regardless of the trial design) is a moving average. The fbutils
package implements the moving average procedure developed in the mvngGrAd
package. The fb_spatial_adj()
function will perform this moving average adjustment for traits. Using checks to measure the within-environment variance ($V_R$), the function determines whether spatial adjustment, in fact, reduced this variation. If so, the adjusted values are retained, but if not, the raw values are retained.
The fb_spatial_adj
function can be run by providing moving average grid dimensions (see the [mvngGrAd
package vignette] (https://CRAN.R-project.org/package=mvngGrAd/mvngGrAd.pdf) for more information).
For instance, the SpatialGradient
trait could be adjusted using a grid of 4 columns, 2 rows, and 2 layers (i.e. diagonals):
checks <- c("Kharkof", "TAM 107", "Scout 66", "Trego", "NW04Y2188") fbt_adjusted <- fb_spatial_adj(fbt_sample, "SpatialGradient", checks, grid.size = list( SpatialGradient = list( grid.rows = 2, grid.cols = 4, grid.layers = 2)) )
The output from fb_spatial_adj()
function includes a summary of the adjustment performance. V_P
is the variance among all observations, V_R
is the residual variance (calculated using the checks), and rel_eff
is the relative efficiency. This last value determines if spatial adjustment was useful. A relative efficiency above 1 indicates that V_R
was lowered in the adjusted observations compared to the unadjusted observations, while a relative efficiency below 1 indicates that V_R
increased as a result of spatial adjustment.
fbt_adjusted$summary$adjustment_summary
We can readily see the impact of spatial adjustment by plotting the adjusted observations using a heatmap:
# Unadjusted fb_heatmap(fbt_sample, traits = "SpatialGradient") # Adjusted fb_heatmap(fbt_adjusted$fbt, traits = "SpatialGradient")
If the grid.size
argument is not provided, the function will optimize the dimensions before performing spatial adjustment. The following code would trigger grid size optimization, however this procedure does take some time, so the output is not shown.
fbt_adjusted <- fb_spatial_adj(fbt_sample, "SpatialGradient", checks)
After spatial adjustment, you can extract the new fbt
-formatted data.frame from the output:
fbt <- fbt_adjusted$fbt
Since the fbt
data.frame is still a data.frame, writing the data to a file simply uses existing R
functions, ideally write.csv()
.
# Write the data write.csv(x = fbt, file = "processed_field_book.csv", row.names = FALSE)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.