read_cell_seg_data: Read and clean an inForm data file.
In akoyabio/phenoptr: inForm Helper Functions

Description Usage Arguments Details Value See Also Examples

read_cell_seg_data makes it easier to use data from Akoya Biosciences' inForm program. It reads data files written by inForm 2.0 and later and does useful cleanup on the result.

read_cell_seg_data(
  path = NA,
  pixels_per_micron = getOption("phenoptr.pixels.per.micron"),
  remove_units = TRUE,
  col_select = NULL
)

`path`	Path to the file to read, or NA to use a file chooser.
`pixels_per_micron`	Conversion factor to microns (default 2 pixels/micron, the resolution of 20x MSI fields taken on Vectra Polaris and Vectra 3.). Set to NA to skip conversion. Set to `'auto'` to read from an associated `component_data.tif` file.
`remove_units`	If TRUE (default), remove the unit name from expression columns.
`col_select`	Optional column selection expression, may be NULL - retain all columns `"phenoptrReports"` - retain only columns needed by functions in the `phenoptrReports` package. A quoted list of one or more selection expressions, like in `dplyr::select()` (see example).

read_cell_seg_data reads both single-field tables, merged tables and consolidated tables and does useful cleanup on the data:

Removes columns that are all NA. These are typically unused summary columns.
Converts percent columns to numeric fractions.
Converts pixel distances to microns. The conversion factor may be specified as a parameter, by setting options(phenoptr.pixels.per.micron), or by reading an associated component_data.tif file.
Optionally removes units from expression names
If the file contains multiple sample names, a tag column is created containing a minimal, unique tag for each sample. This is useful when a short name is needed, for example in chart legends.

If pixels_per_micron='auto', read_cell_seg_data looks for a component_data.tif file in the same directory as path. If found, pixels_per_micron is read from the file and the cell coordinates are offset to the correct spatial location.

If col_select is "phenoptrReports", only columns normally needed by phenoptrReports are read. This can dramatically reduce the time to read a file and the memory required to store the results.

Specifically, passing col_select='phenoptrReports' will omit

Component stats other than mean expression
Shape stats other than area
Path, Processing Region ID, Category Region ID, Lab ID, Confidence, and columns which are normally blank.

A tibble containing the cleaned-up data set.

Other file readers: get_field_info(), list_cell_seg_files(), read_components(), read_maps()

path <- sample_cell_seg_path()
csd <- read_cell_seg_data(path)

# count all the phenotypes in the data
table(csd$Phenotype)

# Read only columns needed by phenoptrReports
csd <- read_cell_seg_data(path, col_select='phenoptrReports')

# Read only position and phenotype columns
csd <- read_cell_seg_data(path,
         col_select=rlang::quo(list(dplyr::contains('Position'),
                                    dplyr::contains('Phenotype'))))
## Not run: 
# Use purrr::map_df to read all cell seg files in a directory
# and return a single tibble.
paths <- list_cell_seg_files(path)
csd <- purrr::map_df(paths, read_cell_seg_data)

## End(Not run)