View source: R/ifcb_extract_features.R
| ifcb_extract_features | R Documentation |
This function computes the "slim" feature set (version 4) and blob masks from
raw Imaging FlowCytobot (IFCB) data by calling the WHOI ifcb-features Python
package. For each bin it writes a feature table
(<bin>_features_v4.csv, 30 morphological features per region of interest)
and an archive of binary blob masks (<bin>_blobs_v4.zip, one 1-bit PNG per
ROI). Features and blobs are written to separate, user-specified directories.
ifcb_extract_features(
data_folder,
features_folder,
blobs_folder,
bins = NULL,
parallel = FALSE,
n_cores = NULL,
overwrite = FALSE,
verbose = TRUE
)
data_folder |
The path to a directory containing raw IFCB data
( |
features_folder |
The path to the directory where the
|
blobs_folder |
The path to the directory where the |
bins |
An optional character vector of bin names (e.g.
|
parallel |
A logical indicating whether to process bins in parallel.
Default is |
n_cores |
An integer specifying the number of parallel workers to use
when |
overwrite |
A logical indicating whether to overwrite existing feature
and blob files. If |
verbose |
A logical indicating whether to print progress messages,
including a progress bar that advances as each bin is processed.
Default is |
This function wraps the extract_slim_features workflow from the
ifcb-features Python repository, which can be found at
https://github.com/WHOIGit/ifcb-features.
Python and the ifcb-features package must be installed to use this function.
The required Python packages can be installed in a virtual environment using
ifcb_py_install(features = TRUE), which additionally installs ifcb-features
and its dependencies (pyifcb, phasepack, scikit-image, scikit-learn).
Python version requirement: pyifcb and its dependencies (notably
h5py) must be available as binary wheels for your Python version;
installation will fail if source compilation is required and the build
environment is incompatible. See
https://github.com/WHOIGit/ifcb-features for current Python version
requirements, and use ifcb_py_install(features = TRUE) to install into a
compatible environment.
Bins are processed sequentially by default. When parallel = TRUE, bins are
distributed across n_cores workers, which can substantially reduce runtime
for large datasets. Existing outputs are skipped unless overwrite = TRUE,
so the function can be re-run to resume an interrupted extraction.
The parallel backend depends on the platform. On Linux, bins run in separate
worker processes, giving true multi-core parallelism. On Windows and macOS,
where the embedded Python interpreter cannot reliably spawn worker processes,
a thread pool is used instead; because of Python's Global Interpreter Lock the
speedup there is smaller and depends on how much of the work runs in native
(numpy / scikit-image) code. A further consequence of the thread backend
is that interrupting a run (ESC / Stop) does not halt a bin already being
processed: it finishes and writes its outputs before the run stops.
Invisibly returns a tibble with one row per bin and the columns
bin, status ("processed", "skipped" or "error") and message.
The function is primarily called for its side effect of writing feature and
blob files to disk.
ifcb_py_install, ifcb_read_features,
https://github.com/WHOIGit/ifcb-features
## Not run:
# Install the Python environment including ifcb-features
ifcb_py_install(features = TRUE)
# Extract features and blobs from all bins in a data folder
ifcb_extract_features(
data_folder = "path/to/data",
features_folder = "path/to/features",
blobs_folder = "path/to/blobs"
)
# Process a subset of bins in parallel using 4 cores
ifcb_extract_features(
data_folder = "path/to/data",
features_folder = "path/to/features",
blobs_folder = "path/to/blobs",
bins = c("D20220522T003051_IFCB134", "D20220522T000439_IFCB134"),
parallel = TRUE,
n_cores = 4
)
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.