knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) options(knitr.duplicate.label = "allow") # needed to run generateReport() with both profiles!
The easyfulcrum package is a tool to process and analyze ecological field sampling data generated using the Fulcrum mobile application.
The easyfulcrum R package offers an organized workflow for processing ecological sampling data generated using the Fulcrum mobile application. easyfulcrum provides simple and efficient functions to clean, process, and visualize ecological field sampling and isolation data collected using custom Fulcrum applications. It also provides functions to join these data with genotype information if organisms isolated from the field are identified using molecular barcodes. Together, the Fulcrum mobile application and easyfulcrum R package allow researchers to easily implement mobile data-collection, cloud-based databases, and standardized data analysis tools to improve ecological sampling accuracy and efficiency.
Fulcrum is a customizable, geographic data-collection platform compatible with Apple iOS and Google Android devices that allows users to collect rich, location-based data. To facilitate large-scale ecological surveys of nematodes that are difficult to identify in the field, we developed two Fulcrum applications. The Nematode field sampling application allows the user to organize various ecological data types associated with the substrate sampled in the field, such as environmental parameters and substrate characteristics. The “Nematode isolation” application helps organize data associated with the specimens isolated from samples after they have been brought into the laboratory.
The Fulcrum data collection application can be downloaded online (https://www.fulcrumapp.com). Fulcrum uses a powerful GUI to help users customize data-collection applications even when they have no coding or database administration knowledge, which makes Fulcrum’s robust, cloud-based database adaptable to sampling nearly any species from nature. If desired, users can customize our field collection and isolation applications following our Fulcrum templates.
Users can use the applications as is, but in order for easyfulcrum to work with custom applications, users should save their field sampling application and isolation applications with a unique identifier in the place of our nematode prefix, e.g. fungus field sampling and fungus isolation.
Install the package via devtools (>= 2.4.1):
install.packages("devtools") devtools::install_github("AndersenLab/easyfulcrum")
Load the package:
library(easyfulcrum)
The makeDirStructure
function makes a standardized directory of folders for the easyfulcrum run, taking a base directory (startdir
) and the project name (projectdirname
) as inputs.
Every collection project should be contained in its own directory. The directory name should follow the YearMonthPlace
format used for Fulcrum collection projects, e.g. 2020JanuaryHawaii
.
makeDirStructure(startdir = "~/Desktop", projectdirname = "2020JanuaryHawaii")
The data
directory contains the raw
and processed
subdirectories.
- raw/fulcrum
holds the .csv
files exported from Fulcrum and raw/fulcrum/photos
contains .jpg
files exported from Fulcrum.
- raw/annotate
can hold spatial location files island.csv
, location.csv
, and trail.csv
that the user generates for mapping the collection sites.
- processed/fulcrum
holds easyfulcrum function outputs.
The reports
directory holds easyfulcrum function outputs. These outputs will be generated by the user processing script(s) saved in the scripts
directory.
Following makeDirStructure
, the user adds collection .csv
and .jpg
files into the appropriate subfolder locations.
We include example files from a small collection to use with this vignette. To copy these files into the project directory you just made we
include a helper function called loadExampleFiles
. This function is not used in the normal easyfulcrum workflow. Note, the (startdir
) and (projectdirname
) should be identical to arguments above for the makeDirStructure
function.
loadExampleFiles(startdir = "~/Desktop", projectdirname = "2020JanuaryHawaii")
Before processing collection data using easyFulcrum, the raw Fulcrum data must be exported from the Fulcrum database using the Fulcrum website’s data export tool. We recommend exporting the data by selecting the following checkboxes:
- the desired project
- include photos
- include GPS data
- field sampling
- isolation
Exporting with change sets are not currently supported.
After the data is exported, the .csv
files must be moved to [your project directory]/data/raw/fulcrum
, and the field sampling photos in .jpg
format are moved to [your project directory]/data/raw/fulcrum/photos
.
The first group of functions cleans the results from the Fulcrum .csv
files.
readFulcrum
takes a dir
argument that specifies the directory to read in Fulcrum .csv
files. This will be useful throughout the package.
dir <- "~/Desktop/2020JanuaryHawaii" raw_fulc <- readFulcrum(dir = dir)
procFulcrum
processes individual data frames and adds flags for unexpected data.
proc_fulc <- procFulcrum(data = raw_fulc)
checkTemperatures
identifies flags in three temperature variables. Setting the return_flags
option to TRUE
will return a list of three data frames that pulls only the rows where each of the three flag types appear. The function automatically prints the rows where the flags exist.
procFulcrum
function assumes that, when raw_substrate_temperature
or raw_ambient_temperature
temperatures are above 40 degrees, the temperatures were mistakenly input as Fahrenheit rather than Celsius, and converts these values to Celsius. It will also notice when both raw_ambient_temperature
and raw_ambient_humidity
get stuck on the same value for 5 or more measurements in a row. These are the three flags returned above.
flag_temp <- checkTemperatures(data = proc_fulc, return_flags = TRUE)
fixTemperatures
takes a) fulcrum_id
s that need to be reverted back to their original values (if readings above 40 degrees were truly in Celsius) for both substrate temperatures (substrate_temperature_ids
) and ambient temperatures (ambient_temperature_ids
) as well as b) fulcrum_id
s for which humidity and temperature readings need to be set to NA due to a stuck measurement device (ambient_temperature_run_ids
). In the example below we set all the ambient_temperature_run = TRUE
values to NA. Also, note that the first observation of the humidity and temperature value for a given run is not flagged as part of the run.
proc_fulc_clean <- fixTemperatures(data = proc_fulc, substrate_temperature_ids = "a7db618d-44cc-4b4a-bc67-871306029274", ambient_temperature_ids = "b1f20ae4-c5c2-426f-894a-e1f46c2fa693", ambient_temperature_run_ids=c("dda77efe-d73c-48e9-aefb-b508e613256b", "93de14a0-40ab-4793-8614-ab1512ab158c", "216cb71a-6470-46eb-950d-366ac3180498", "920a6a56-7a29-47f4-afce-a5f83787d639", "6b25c113-4bb6-4bc5-9473-eca1f8075d10"))
The flag variables will be maintained, so rerunning checkTemperatures
on the cleaned processed fulcrum data will ensure that corrections have been implemented as desired.
joinFulcrum
joins the Fulcrum dataframes. The function works to first join the processed field sampling dataframe to the processed isolation dataframe via unique collection labels (c_label
). Following this join, it selects a “best” photo for each unique C-label based on the existence of a matching photo id in the processed field sampling sample photo dataframe, before joining the aggregated data to this dataframe as well. Finally, the complete merge of the dataframes is achieved when the processed isolation S-labeled plates is joined to this large dataframe on the basis of isolation id (s_label
).
If the user is using easyfulcrum on customized Fulcrum applications other than "Nematode field sampling" and "Nematode isolation", it is recommended that select_vars
is set to FALSE
, such that joinFulcrum
does not only return the default variables.
join_fulc <- joinFulcrum(data = proc_fulc, select_vars = TRUE)
checkJoin
conducts 10 different checks for flags in the joined Fulcrum data frame regarding extreme temperatures and altitude values, missing, improper, and/or duplicated C-labels, missing, and/or duplicated isolation records corresponding to these C-labels, and finally unusual sample photo numbers. The return_flags
option is the same as in checkTemperatures
, and the function automatically prints the rows where the flags exist. If desired, the user can manually edit values or correct mistakes in the underlying data based on these flags and re-run the pipeline again.
flag_join <- checkJoin(data = join_fulc, return_flags = TRUE)
annnotateFulcrum
adds spatial information to the joined Fulcrum data frame, noting if sample collections were collected on specific islands, trails, and/or locations. Examples islands, trails, and/or locations from Hawaii are automatically loaded with the package, but a user can specify manually made .csv
files and place them in data/raw/annotate, specifying the base directory as dir
in annotateFulcrum
will override the example files.
The hawaii_islands
and hawaii_locations
dataframes are composed of simple latitude and longitude starts and ends to create a bounding box, and hawaii_trails
is composed of a character list of geojson polygon points from geojson output of that can be created on a bounding box online tool.
If the user is using easyfulcrum on customized Fulcrum applications other than "Nematode field sampling" and "Nematode isolation", it is recommended that select_vars
is set to FALSE
, such that annotateFulcrum
does not only return the default variables.
anno_fulc <- annotateFulcrum(data = join_fulc, dir = NULL, select_vars = TRUE)
This second group of functions operates to clean the results from a project specific Google Sheet that contains genotyping results.
Since easyFulcrum was originally built for processing nematode samples we provide a profile
parameter to toggle the functions between the neamatode specific nematode
profile and the more flexible, non-nematode specific general
profile.
Users that want to use the general
profile can use our "general" genotyping sheet template
Users that want to use the nematode
profile can use our "nematode" genotyping sheet template
Details on how to fill out a "nematode" genotyping sheet can be found in the Nematode Collection Protocol, look for "wild_isolate_genotyping_template".
readGenotypes
reads in genotyping data from a Google Sheet with requisite gsKey
. The col_types
variable will specify the class of each data column. Note, additional columns can be added to either genotyping template if desired.
For more details on reading in genotyping sheets, look into the googlesheets4 package (which underlies this function), as well as further information on how to specify the col_types
if needed.
# read example data from the "general" genotyping template raw_geno_general <- readGenotypes(gsKey = c("1aXH-8UDvFVddl7JA-R2y8QOcNxESpmseZGjheM8hDro"), col_types = "cccdcc") head(raw_geno_general) # read example data from the "nematode" genotyping template raw_geno_nema <- readGenotypes(gsKey = c("1eviRoe0NyIEkIexM6c_oTVTX6U4ndPx-hkXJvkjhqWM"), col_types = "cDDdcdcddddddDcDDdcdcdddddddcdcccddccc")
checkGenotypes
both processes the genotyping data (adds flags) and returns info on those flags if desired. Setting the profile
parameter to general
will check for unexpected or improper use of S-labels, including missing isolations, improper S-label names, and/or duplicated S-labels. Setting the profile
parameter to nematode
will result in additional nematode specific checks, including unusual species IDs, strain names, proliferation values and whether ITS2 genotypes are missing when expected. Setting the return_geno
option to TRUE
and the return_flags
option to FALSE
will return the processed genotyping data and print information on the flags. Setting the return_geno
option to FALSE
and the return_flags
option to TRUE
will return a list of data frames that detail the rows where the flags appear. Both of these cannot be TRUE
at the same time.
proc_geno_general <- checkGenotypes(geno_data = raw_geno_general, fulc_data = anno_fulc, return_geno = TRUE, return_flags = FALSE, profile = "general") proc_geno_nema <- checkGenotypes(geno_data = raw_geno_nema, fulc_data = anno_fulc, return_geno = TRUE, return_flags = FALSE, profile = "nematode")
flag_geno_general <- checkGenotypes(geno_data = raw_geno_general, fulc_data = anno_fulc, return_geno = FALSE, return_flags = TRUE, profile = "general")
Based on these flags "fixed" genotyping sheets were made, eliminating rows with blank S-labels, duplicated S-labels, etc. The "fixed" data are read in below and checkGenotypes
is re-run on the "fixed" data.
# general example raw_geno_general_fixed <- readGenotypes(gsKey = c("1AcovAEfQIF46PigrrM_D2QPOpdTQM-YndlFo4MmsQoc"), col_types = "cccdcc") proc_geno_general_fixed <- checkGenotypes(geno_data = raw_geno_general_fixed, fulc_data = anno_fulc, return_geno = TRUE, return_flags = FALSE, profile = "general") # nematode example raw_geno_nema_fixed <- readGenotypes(gsKey = c("1WaOsAU0Pmf_rOp9BoGmDeYMmyENw0gBppdllfedLG9s"), col_types = "cDDdcdcddddddDcDDdcdcdddddddcdcccddccc") proc_geno_nema_fixed <- checkGenotypes(geno_data = raw_geno_nema_fixed, fulc_data = anno_fulc, return_geno = TRUE, return_flags = FALSE, profile = "nematode")
joinGenoFulc
will join the joined Fulcrum data frame with the genotyping information. This function will also save the processed genotyping information in data/processed/genotypes if dir
is set to the base folder of the project.
If the user is using easyfulcrum on customized Fulcrum applications other than "Nematode field sampling" and "Nematode isolation", it is recommended that select_vars
is set to FALSE
, such that joinGenoFulc
does not only return the default variables.
# general example join_genofulc_general <- joinGenoFulc(geno = proc_geno_general_fixed, fulc = anno_fulc, dir = dir, select_vars = TRUE) # nematode example join_genofulc_nema <- joinGenoFulc(geno = proc_geno_nema_fixed, fulc = anno_fulc, dir = NULL, select_vars = TRUE)
The final function processes and resizes images, adding details to a final dataframe.
procPhotos
copies raw sample photos, renames them with the C-label, makes a new directory data/processed/fulcrum/photos and pastes the renamed files there. The function also makes thumbnails for use with interactive maps and places these in the data/processed/fulcrum/photos/thumbnails directory. Setting the CeNDR
option to TRUE
will rename photos of samples meeting CeNDR criteria with the name of the nematode strains isolated from the sample and paste them in the data/processed/fulcrum/photos/CeNDR directory.
The function will also accept a public url (pub_url
) for hosting the sample photos renamed by C-label. A compatible public url should follow the pub_url/Project/sampling_thumbs/C-label.jpg
format. For example, if the full url for C-5133 is
https://storage.googleapis.com/elegansvariation.org/photos/isolation/fulcrum/2020JanuaryHawaii/sampling_thumbs/C-5133.jpg, the pub_url should be set to https://storage.googleapis.com/elegansvariation.org/photos/isolation/fulcrum/. The project name, "sampling_thumbs", C-label, and file extension will be filled by the function.
We've included example code for each profile, but note that if you rerun the procPhotos
function with the overwite
parameter set to TRUE
the output files will be overwritten.
final_data_general <- procPhotos(dir = dir, data = join_genofulc_general, max_dim = 500, overwrite = TRUE, pub_url = "https://storage.googleapis.com/elegansvariation.org/photos/isolation/fulcrum/", CeNDR = TRUE) head(final_data_general) final_data_nema <- procPhotos(dir = dir, data = join_genofulc_nema, max_dim = 500, overwrite = TRUE, pub_url = "https://storage.googleapis.com/elegansvariation.org/photos/isolation/fulcrum/", CeNDR = TRUE) head(final_data_nema)
We include two functions for generating summaries of a finalized colleciton project. The final dataframes can otherwise be used as needed by the user.
makeSpSheet
generates a species specific .csv
file for the species of interest (target_sp
) and writes it to the /reports
subdirectory. This function simplifies the output of the final dataframe, pulling variables of particular interest for a user specified species of interest. This function is written to standardize the output dataframe to meet the specifications for submitting wild nematode collections to the Caenorhabditis Natural Diversity Resource (CaeNDR). For this reason the makeSpSheet
function is likely only applicable to nematode sampling projects.
makeSpSheet
also returns a dataframe with flags for these select samples, and prints a description of these flags.
# general example sp_sheet_general <- makeSpSheet(data = final_data_general, target_sp = "Caenorhabditis briggsae", dir = dir) # nematode example sp_sheet_nema <- makeSpSheet(data = final_data_nema, target_sp = "Caenorhabditis briggsae", dir = dir)
We provide a function, generateReport
, that will generate an interactive overview of the entire sampling project. generateReport
saves a file named sampleReport.Rmd
into the /scripts
sub-directory, and saves a sampleReport.html
file in the /reports
sub-directory. The sampleReport.html
can be viewed in any web browser and includes: an overview of the collection project (such as who conducted the respective processes and on what dates they were completed), summary tables of collection and isolation data, interactive maps of where the collections in the project were acquired, and box plots showing the distributions of various environmental parameters at all collection sites. These parameters include substrate temperature, ambient temperature, humidity, and elevation.
Please feel free to edit the sampleReport.Rmd
as you require once it is moved into the /scripts
sub-directory.
The profile
parameter will switch between nematode specific reports and non-nematode specific reports.
# general example generateReport(data = final_data_general, dir = dir, target_sp = c("O. myriophilus", "Caenorhabditis briggsae"), profile = "general") # nematode example generateReport(data = final_data_nema, dir = dir, profile = "nematode")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.