knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

options(knitr.duplicate.label = "allow") # needed to run generateReport() with both profiles!

The easyfulcrum package is a tool to process and analyze ecological field sampling data generated using the Fulcrum mobile application.

The easyfulcrum R package offers an organized workflow for processing ecological sampling data generated using the Fulcrum mobile application. easyfulcrum provides simple and efficient functions to clean, process, and visualize ecological field sampling and isolation data collected using custom Fulcrum applications. It also provides functions to join these data with genotype information if organisms isolated from the field are identified using molecular barcodes. Together, the Fulcrum mobile application and easyfulcrum R package allow researchers to easily implement mobile data-collection, cloud-based databases, and standardized data analysis tools to improve ecological sampling accuracy and efficiency.

What is Fulcrum?

Fulcrum is a customizable, geographic data-collection platform compatible with Apple iOS and Google Android devices that allows users to collect rich, location-based data. To facilitate large-scale ecological surveys of nematodes that are difficult to identify in the field, we developed two Fulcrum applications. The Nematode field sampling application allows the user to organize various ecological data types associated with the substrate sampled in the field, such as environmental parameters and substrate characteristics. The “Nematode isolation” application helps organize data associated with the specimens isolated from samples after they have been brought into the laboratory.

Fulcrum installation and application customization

The Fulcrum data collection application can be downloaded online (https://www.fulcrumapp.com). Fulcrum uses a powerful GUI to help users customize data-collection applications even when they have no coding or database administration knowledge, which makes Fulcrum’s robust, cloud-based database adaptable to sampling nearly any species from nature. If desired, users can customize our field collection and isolation applications following our Fulcrum templates.

Users can use the applications as is, but in order for easyfulcrum to work with custom applications, users should save their field sampling application and isolation applications with a unique identifier in the place of our nematode prefix, e.g. fungus field sampling and fungus isolation.

easyfulcrum installation:

Install the package via devtools (>= 2.4.1):

install.packages("devtools")
devtools::install_github("AndersenLab/easyfulcrum")

Load the package:

library(easyfulcrum)

Directory structure:

The makeDirStructure function makes a standardized directory of folders for the easyfulcrum run, taking a base directory (startdir) and the project name (projectdirname) as inputs.

Every collection project should be contained in its own directory. The directory name should follow the YearMonthPlace format used for Fulcrum collection projects, e.g. 2020JanuaryHawaii.

makeDirStructure(startdir = "~/Desktop",
                 projectdirname = "2020JanuaryHawaii")

The data directory contains the raw and processed subdirectories.
- raw/fulcrum holds the .csv files exported from Fulcrum and raw/fulcrum/photos contains .jpg files exported from Fulcrum.
- raw/annotate can hold spatial location files island.csv, location.csv, and trail.csv that the user generates for mapping the collection sites.
- processed/fulcrum holds easyfulcrum function outputs.

The reports directory holds easyfulcrum function outputs. These outputs will be generated by the user processing script(s) saved in the scripts directory.

Following makeDirStructure, the user adds collection .csv and .jpg files into the appropriate subfolder locations.

We include example files from a small collection to use with this vignette. To copy these files into the project directory you just made we include a helper function called loadExampleFiles. This function is not used in the normal easyfulcrum workflow. Note, the (startdir) and (projectdirname) should be identical to arguments above for the makeDirStructure function.

loadExampleFiles(startdir = "~/Desktop",
                 projectdirname = "2020JanuaryHawaii")

Exporting data from Fulcrum:

Before processing collection data using easyFulcrum, the raw Fulcrum data must be exported from the Fulcrum database using the Fulcrum website’s data export tool. We recommend exporting the data by selecting the following checkboxes:
- the desired project
- include photos
- include GPS data
- field sampling
- isolation

Exporting with change sets are not currently supported.

After the data is exported, the .csv files must be moved to [your project directory]/data/raw/fulcrum, and the field sampling photos in .jpg format are moved to [your project directory]/data/raw/fulcrum/photos.

Reading, processing, and joining Fulcrum results:

The first group of functions cleans the results from the Fulcrum .csv files.

readFulcrum:

readFulcrum takes a dir argument that specifies the directory to read in Fulcrum .csv files. This will be useful throughout the package.

dir <- "~/Desktop/2020JanuaryHawaii"
raw_fulc <- readFulcrum(dir = dir)

procFulcrum:

procFulcrum processes individual data frames and adds flags for unexpected data.

proc_fulc <- procFulcrum(data = raw_fulc)

checkTemperatures:

checkTemperatures identifies flags in three temperature variables. Setting the return_flags option to TRUE will return a list of three data frames that pulls only the rows where each of the three flag types appear. The function automatically prints the rows where the flags exist.

procFulcrum function assumes that, when raw_substrate_temperature or raw_ambient_temperature temperatures are above 40 degrees, the temperatures were mistakenly input as Fahrenheit rather than Celsius, and converts these values to Celsius. It will also notice when both raw_ambient_temperature and raw_ambient_humidity get stuck on the same value for 5 or more measurements in a row. These are the three flags returned above.

flag_temp <- checkTemperatures(data = proc_fulc, return_flags = TRUE)

fixTemperatures:

fixTemperatures takes a) fulcrum_ids that need to be reverted back to their original values (if readings above 40 degrees were truly in Celsius) for both substrate temperatures (substrate_temperature_ids) and ambient temperatures (ambient_temperature_ids) as well as b) fulcrum_ids for which humidity and temperature readings need to be set to NA due to a stuck measurement device (ambient_temperature_run_ids). In the example below we set all the ambient_temperature_run = TRUE values to NA. Also, note that the first observation of the humidity and temperature value for a given run is not flagged as part of the run.

proc_fulc_clean <- fixTemperatures(data = proc_fulc,
                                   substrate_temperature_ids = "a7db618d-44cc-4b4a-bc67-871306029274",
                                   ambient_temperature_ids = "b1f20ae4-c5c2-426f-894a-e1f46c2fa693",
                                   ambient_temperature_run_ids=c("dda77efe-d73c-48e9-aefb-b508e613256b",
                                                                 "93de14a0-40ab-4793-8614-ab1512ab158c",
                                                                 "216cb71a-6470-46eb-950d-366ac3180498",
                                                                 "920a6a56-7a29-47f4-afce-a5f83787d639",
                                                                 "6b25c113-4bb6-4bc5-9473-eca1f8075d10"))

The flag variables will be maintained, so rerunning checkTemperatures on the cleaned processed fulcrum data will ensure that corrections have been implemented as desired.

joinFulcrum:

joinFulcrum joins the Fulcrum dataframes. The function works to first join the processed field sampling dataframe to the processed isolation dataframe via unique collection labels (c_label). Following this join, it selects a “best” photo for each unique C-label based on the existence of a matching photo id in the processed field sampling sample photo dataframe, before joining the aggregated data to this dataframe as well. Finally, the complete merge of the dataframes is achieved when the processed isolation S-labeled plates is joined to this large dataframe on the basis of isolation id (s_label).

If the user is using easyfulcrum on customized Fulcrum applications other than "Nematode field sampling" and "Nematode isolation", it is recommended that select_vars is set to FALSE, such that joinFulcrum does not only return the default variables.

join_fulc <- joinFulcrum(data = proc_fulc, select_vars = TRUE)

checkJoin:

checkJoin conducts 10 different checks for flags in the joined Fulcrum data frame regarding extreme temperatures and altitude values, missing, improper, and/or duplicated C-labels, missing, and/or duplicated isolation records corresponding to these C-labels, and finally unusual sample photo numbers. The return_flags option is the same as in checkTemperatures, and the function automatically prints the rows where the flags exist. If desired, the user can manually edit values or correct mistakes in the underlying data based on these flags and re-run the pipeline again.

flag_join <- checkJoin(data = join_fulc, return_flags = TRUE)

annotateFulcrum:

annnotateFulcrum adds spatial information to the joined Fulcrum data frame, noting if sample collections were collected on specific islands, trails, and/or locations. Examples islands, trails, and/or locations from Hawaii are automatically loaded with the package, but a user can specify manually made .csv files and place them in data/raw/annotate, specifying the base directory as dir in annotateFulcrum will override the example files.

The hawaii_islands and hawaii_locations dataframes are composed of simple latitude and longitude starts and ends to create a bounding box, and hawaii_trails is composed of a character list of geojson polygon points from geojson output of that can be created on a bounding box online tool.

If the user is using easyfulcrum on customized Fulcrum applications other than "Nematode field sampling" and "Nematode isolation", it is recommended that select_vars is set to FALSE, such that annotateFulcrum does not only return the default variables.

anno_fulc <- annotateFulcrum(data = join_fulc, dir = NULL, select_vars = TRUE)

Reading, processing, and joining genotyping google sheet:

This second group of functions operates to clean the results from a project specific Google Sheet that contains genotyping results.

Since easyFulcrum was originally built for processing nematode samples we provide a profile parameter to toggle the functions between the neamatode specific nematode profile and the more flexible, non-nematode specific general profile.

Making a project specific "genotyping sheet":

Users that want to use the general profile can use our "general" genotyping sheet template

Users that want to use the nematode profile can use our "nematode" genotyping sheet template

Details on how to fill out a "nematode" genotyping sheet can be found in the Nematode Collection Protocol, look for "wild_isolate_genotyping_template".

readGenotypes:

readGenotypes reads in genotyping data from a Google Sheet with requisite gsKey. The col_types variable will specify the class of each data column. Note, additional columns can be added to either genotyping template if desired.

For more details on reading in genotyping sheets, look into the googlesheets4 package (which underlies this function), as well as further information on how to specify the col_types if needed.

# read example data from the "general" genotyping template 
raw_geno_general <- readGenotypes(gsKey = c("1aXH-8UDvFVddl7JA-R2y8QOcNxESpmseZGjheM8hDro"), col_types = "cccdcc")
head(raw_geno_general)

# read example data from the "nematode" genotyping template 
raw_geno_nema <- readGenotypes(gsKey = c("1eviRoe0NyIEkIexM6c_oTVTX6U4ndPx-hkXJvkjhqWM"), col_types = "cDDdcdcddddddDcDDdcdcdddddddcdcccddccc")

checkGenotypes:

checkGenotypes both processes the genotyping data (adds flags) and returns info on those flags if desired. Setting the profile parameter to general will check for unexpected or improper use of S-labels, including missing isolations, improper S-label names, and/or duplicated S-labels. Setting the profile parameter to nematode will result in additional nematode specific checks, including unusual species IDs, strain names, proliferation values and whether ITS2 genotypes are missing when expected. Setting the return_geno option to TRUE and the return_flags option to FALSE will return the processed genotyping data and print information on the flags. Setting the return_geno option to FALSE and the return_flags option to TRUE will return a list of data frames that detail the rows where the flags appear. Both of these cannot be TRUE at the same time.

proc_geno_general <- checkGenotypes(geno_data = raw_geno_general, fulc_data = anno_fulc, 
                                  return_geno = TRUE, return_flags = FALSE, profile = "general")

proc_geno_nema <- checkGenotypes(geno_data = raw_geno_nema, fulc_data = anno_fulc, 
                                  return_geno = TRUE, return_flags = FALSE, profile = "nematode")
flag_geno_general <- checkGenotypes(geno_data = raw_geno_general, fulc_data = anno_fulc, 
                          return_geno = FALSE, return_flags = TRUE, profile = "general")

Based on these flags "fixed" genotyping sheets were made, eliminating rows with blank S-labels, duplicated S-labels, etc. The "fixed" data are read in below and checkGenotypes is re-run on the "fixed" data.

# general example
raw_geno_general_fixed <- readGenotypes(gsKey = c("1AcovAEfQIF46PigrrM_D2QPOpdTQM-YndlFo4MmsQoc"), col_types = "cccdcc")

proc_geno_general_fixed <- checkGenotypes(geno_data = raw_geno_general_fixed, fulc_data = anno_fulc,
                            return_geno = TRUE, return_flags = FALSE, profile = "general")

# nematode example
raw_geno_nema_fixed <- readGenotypes(gsKey = c("1WaOsAU0Pmf_rOp9BoGmDeYMmyENw0gBppdllfedLG9s"), col_types = "cDDdcdcddddddDcDDdcdcdddddddcdcccddccc")

proc_geno_nema_fixed <- checkGenotypes(geno_data = raw_geno_nema_fixed, fulc_data = anno_fulc, 
                                 return_geno = TRUE, return_flags = FALSE, profile = "nematode")

joinGenoFulc:

joinGenoFulc will join the joined Fulcrum data frame with the genotyping information. This function will also save the processed genotyping information in data/processed/genotypes if dir is set to the base folder of the project.

If the user is using easyfulcrum on customized Fulcrum applications other than "Nematode field sampling" and "Nematode isolation", it is recommended that select_vars is set to FALSE, such that joinGenoFulc does not only return the default variables.

# general example
join_genofulc_general <- joinGenoFulc(geno = proc_geno_general_fixed, fulc = anno_fulc, dir = dir, select_vars = TRUE)

# nematode example
join_genofulc_nema <- joinGenoFulc(geno = proc_geno_nema_fixed, fulc = anno_fulc, dir = NULL, select_vars = TRUE)

Reading, resizing, and joining collection images:

The final function processes and resizes images, adding details to a final dataframe.

procPhotos:

procPhotos copies raw sample photos, renames them with the C-label, makes a new directory data/processed/fulcrum/photos and pastes the renamed files there. The function also makes thumbnails for use with interactive maps and places these in the data/processed/fulcrum/photos/thumbnails directory. Setting the CeNDR option to TRUE will rename photos of samples meeting CeNDR criteria with the name of the nematode strains isolated from the sample and paste them in the data/processed/fulcrum/photos/CeNDR directory.

The function will also accept a public url (pub_url) for hosting the sample photos renamed by C-label. A compatible public url should follow the pub_url/Project/sampling_thumbs/C-label.jpg format. For example, if the full url for C-5133 is https://storage.googleapis.com/elegansvariation.org/photos/isolation/fulcrum/2020JanuaryHawaii/sampling_thumbs/C-5133.jpg, the pub_url should be set to https://storage.googleapis.com/elegansvariation.org/photos/isolation/fulcrum/. The project name, "sampling_thumbs", C-label, and file extension will be filled by the function.

We've included example code for each profile, but note that if you rerun the procPhotos function with the overwite parameter set to TRUE the output files will be overwritten.

final_data_general <- procPhotos(dir = dir, data = join_genofulc_general,
                         max_dim = 500, overwrite = TRUE,
                         pub_url = "https://storage.googleapis.com/elegansvariation.org/photos/isolation/fulcrum/",
                         CeNDR = TRUE)
head(final_data_general)

final_data_nema <- procPhotos(dir = dir, data = join_genofulc_nema,
                         max_dim = 500, overwrite = TRUE,
                         pub_url = "https://storage.googleapis.com/elegansvariation.org/photos/isolation/fulcrum/",
                         CeNDR = TRUE)
head(final_data_nema)

Generating summary and output files:

We include two functions for generating summaries of a finalized colleciton project. The final dataframes can otherwise be used as needed by the user.

makeSpSheet:

makeSpSheet generates a species specific .csv file for the species of interest (target_sp) and writes it to the /reports subdirectory. This function simplifies the output of the final dataframe, pulling variables of particular interest for a user specified species of interest. This function is written to standardize the output dataframe to meet the specifications for submitting wild nematode collections to the Caenorhabditis Natural Diversity Resource (CaeNDR). For this reason the makeSpSheet function is likely only applicable to nematode sampling projects.

makeSpSheet also returns a dataframe with flags for these select samples, and prints a description of these flags.

# general example
sp_sheet_general <- makeSpSheet(data = final_data_general, target_sp = "Caenorhabditis briggsae", dir = dir)

# nematode example
sp_sheet_nema <- makeSpSheet(data = final_data_nema, target_sp = "Caenorhabditis briggsae", dir = dir)

generateReport:

We provide a function, generateReport, that will generate an interactive overview of the entire sampling project. generateReport saves a file named sampleReport.Rmd into the /scripts sub-directory, and saves a sampleReport.html file in the /reports sub-directory. The sampleReport.html can be viewed in any web browser and includes: an overview of the collection project (such as who conducted the respective processes and on what dates they were completed), summary tables of collection and isolation data, interactive maps of where the collections in the project were acquired, and box plots showing the distributions of various environmental parameters at all collection sites. These parameters include substrate temperature, ambient temperature, humidity, and elevation.

Please feel free to edit the sampleReport.Rmd as you require once it is moved into the /scripts sub-directory.

The profile parameter will switch between nematode specific reports and non-nematode specific reports.

# general example
generateReport(data = final_data_general, dir = dir, target_sp = c("O. myriophilus", "Caenorhabditis briggsae"),
               profile = "general")

# nematode example
generateReport(data = final_data_nema, dir = dir, profile = "nematode")


AndersenLab/easyfulcrum documentation built on Aug. 23, 2023, 2:35 a.m.