This vignette provides an overview of quality control (QC) methods for Imaging FlowCytobot (IFCB) data using the iRfcb
package. The package offers tools to analyze Particle Size Distribution (PSD) following Hayashi et al. in prep, verify geographical positions, and integrate contextual data from sources like ferrybox systems. These QC workflows ensure high-quality datasets for phytoplankton and microzooplankton monitoring in marine ecosystems.
You'll learn how to:
iRfcb
package and Python
environment.Follow this tutorial to streamline the QC process and ensure reliable IFCB data.
You can install the package from CRAN using:
install.packages("iRfcb")
Some functions from the iRfcb
package used in this tutorial require Python
to be installed. You can download Python
from the official website: python.org/downloads.
The iRfcb
package can be configured to automatically activate an installed Python virtual environment (venv) upon loading by setting an environment variable. For more details, please refer to the package README.
Load the iRfcb
library:
library(iRfcb)
library(iRfcb)
library(reticulate) # Define path to virtual environment env_path <- file.path(tempdir(), "iRfcb") # Or your preferred venv path # Install python virtual environment tryCatch({ ifcb_py_install(envname = env_path) }, error = function(e) { warning("Python environment could not be installed.") })
# Check if Python is available if (!py_available(initialize = TRUE)) { knitr::opts_chunk$set(eval = FALSE) warning("Python is not available. Skipping vignette evaluation.") } else { # List available packages available_packages <- py_list_packages(python = reticulate::py_discover_config()$python) # Check if pandas and matplotlib are available if (!"pandas" %in% available_packages$package || !"matplotlib" %in% available_packages$package) { knitr::opts_chunk$set(eval = FALSE) warning("Required python modules are not available. Skipping vignette evaluation.") } }
To get started, download sample data from the SMHI IFCB Plankton Image Reference Library (Torstensson et al. 2024) with the following function:
# Define data directory data_dir <- "data" # Download and extract test data in the data folder ifcb_download_test_data( dest_dir = data_dir, max_retries = 10, sleep_time = 30, verbose = FALSE )
# Define data directory data_dir <- "data" # Download and extract test data in the data folder if (!dir.exists(data_dir)) { # Download and extract test data if the folder does not exist ifcb_download_test_data( dest_dir = data_dir, max_retries = 10, sleep_time = 30, verbose = FALSE ) }
IFCB data can be quality controlled by analyzing the particle size distribution (PSD) (Hayashi et al. in prep). iRfcb
uses the code available at https://github.com/kudelalab/PSD, which is efficient in detecting samples with bubbles, beads, incomplete runs etc. Before running the PSD quality check, ensure the necessary Python environment is set up and activated:
# Define path to virtual environment env_path <- "~/.virtualenvs/iRfcb" # Or your preferred venv path # Install python virtual environment ifcb_py_install(envname = env_path) # Run PSD quality control psd <- ifcb_psd( feature_folder = "data/features/2023", hdr_folder = "data/data/2023", save_data = FALSE, output_file = NULL, plot_folder = NULL, use_marker = FALSE, start_fit = 10, r_sqr = 0.5, beads = 10 ** 12, bubbles = 150, incomplete = c(1500, 3), missing_cells = 0.7, biomass = 1000, bloom = 5, humidity = 70 )
# Run PSD quality control psd <- ifcb_psd( feature_folder = "data/features/2023", hdr_folder = "data/data/2023", save_data = FALSE, output_file = NULL, plot_folder = NULL, use_marker = FALSE, start_fit = 10, r_sqr = 0.5, beads = 10 ** 12, bubbles = 150, incomplete = c(1500, 3), missing_cells = 0.7, biomass = 1000, bloom = 5, humidity = 70 )
The results can be printed and visualized through plots:
# Print output from PSD head(psd$fits) head(psd$flags) # Plot PSD of the first sample plot <- ifcb_psd_plot( sample_name = psd$data$sample[1], data = psd$data, fits = psd$fits, start_fit = 10 ) # Print the plot print(plot)
To determine if the IFCB is near land (i.e. ship in harbor), examine the position data in the .hdr
files (or from vectors of latitudes and longitudes):
# Read HDR data and extract GPS position (when available) gps_data <- ifcb_read_hdr_data( "data/data/", gps_only = TRUE, verbose = FALSE # Do not print progress bar ) # Create new column with the results gps_data$near_land <- ifcb_is_near_land( gps_data$gpsLatitude, gps_data$gpsLongitude, distance = 100, # 100 meters from shore shape = NULL # Using the default NE 1:10m Land Polygon ) # Print output head(gps_data) # Alternatively, you can choose to plot the points on a map near_land_plot <- ifcb_is_near_land( gps_data$gpsLatitude, gps_data$gpsLongitude, distance = 2500, # 2500 meters from shore plot = TRUE, ) # Print the plot print(near_land_plot)
For more accurate determination, a detailed coastline .shp
file may be required (e.g. the EEA Coastline Polygon). Refer to the help pages of ifcb_is_near_land()
for further information.
To identify the specific sub-basin of the Baltic Sea (or using a custom shape-file) from which an IFCB sample was collected, analyze the position data:
# Define example latitude and longitude vectors latitudes <- c(55.337, 54.729, 56.311, 57.975) longitudes <- c(12.674, 14.643, 12.237, 10.637) # Check in which Baltic sea basin the points are in points_in_the_baltic <- ifcb_which_basin(latitudes, longitudes, shape_file = NULL) # Print output print(points_in_the_baltic) # Plot the points and the basins ifcb_which_basin(latitudes, longitudes, plot = TRUE, shape_file = NULL)
This function reads a pre-packaged shapefile of the Baltic Sea, Kattegat, and Skagerrak basins from the iRfcb
package by default, or a user-supplied shapefile if provided. The shapefiles provided in iRfcb
originate from SHARK.
This check is useful if only you want to apply a classifier specifically to phytoplankton from the Baltic Sea.
# Define example latitude and longitude vectors latitudes <- c(55.337, 54.729, 56.311, 57.975) longitudes <- c(12.674, 14.643, 12.237, 10.637) # Check if the points are in the Baltic Sea Basin points_in_the_baltic <- ifcb_is_in_basin(latitudes, longitudes) # Print results print(points_in_the_baltic) # Plot the points and the basin ifcb_is_in_basin(latitudes, longitudes, plot = TRUE)
This function reads a land-buffered shapefile of the Baltic Sea Basin from the iRfcb
package by default, or a user-supplied shapefile if provided.
This function is used by SMHI to collect and match stored ferrybox positions when they are not available in the .hdr
files. An example ferrybox data file is provided in iRfcb
with data matching sample D20220522T000439_IFCB134.
# Print available coordinates from .hdr files head(gps_data, 4) # Define path where ferrybox data are located ferrybox_folder <- "data/ferrybox_data" # Get GPS position from ferrybox data positions <- ifcb_get_ferrybox_data(gps_data$timestamp, ferrybox_folder) # Print result head(positions)
The ifcb_get_ferrybox_data()
function can also be used to extract additional ferrybox parameters, such as temperature (parameter number 8180) and salinity (parameter number 8181).
# Get salinity and temperature from ferrybox data ferrybox_data <- ifcb_get_ferrybox_data(gps_data$timestamp, ferrybox_folder, parameters = c("8180", "8181")) # Print result head(ferrybox_data)
This concludes this tutorial for the iRfcb
package. For additional guides—such as data sharing and MATLAB integration—please refer to the other tutorials available on the project's webpage. See how data pipelines can be constructed using iRfcb
in the following Example Project. Happy analyzing!
# Clean up unlink(data_dir, recursive = TRUE) unlink(env_path, recursive = TRUE)
# Print citation citation("iRfcb")
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.