readVizgen | R Documentation |
This function reads the standard Vizgen MERFISH output into an SFE object.
The coordinates are in microns. Cell centroids are read into
colGeometry
"centroids", and cell segmentations are read into
colGeometry
"cellSeg". The image(s) (polyT, DAPI, and cell boundaries)
are also read as SpatRaster
objects so they are not loaded into
memory unless necessary. Because the image's origin is the top left while the
geometry's origin is bottom left, either the image or the geometry needs to
be flipped. Because the image accompanying MERFISH datasets are usually very
large, the coordinates will be flipped so the flipping operation won't load
the entire image into memory. Large datasets with hundreds of thousands of
cells can take a while to read if reading transcript spots as it takes a
while to convert the spots to MULTIPOINT geometries.
readVizgen(
data_dir,
z = "all",
sample_id = "sample01",
min_area = NULL,
image = c("DAPI", "PolyT", "Cellbound"),
flip = c("geometry", "image", "none"),
max_flip = "50 MB",
filter_counts = FALSE,
add_molecules = FALSE,
use_bboxes = FALSE,
use_cellpose = TRUE,
BPPARAM = SerialParam(),
file_out = file.path(data_dir, "detected_transcripts.parquet"),
z_option = c("3d", "split")
)
data_dir |
Top level output directory. |
z |
Integer, z index to read, or "all", indicating z-planes of the images and transcript spots to read. While cell segmentation seems to have multiple z-planes, the segmentation in all z-planes are the same so in effect the cell segmentatio is only in 2D. |
sample_id |
A |
min_area |
Minimum cell area in square microns or pixel units (eg for CosMX). Anything smaller will be considered artifact or debris and removed. Default to 'NULL', ie no filtering of polygons. |
image |
Which image(s) to load, can be "DAPI", "PolyT", "Cellbound" or any combination of them. |
flip |
To flip the image, geometry coordinates, or none. Because the image has the origin at the top left while the geometry has origin at the bottom left, one of them needs to be flipped for them to match. If one of them is already flipped, then use "none". The image will not be flipped if it's GeoTIFF. |
max_flip |
Maximum size of the image allowed to flip the image. Because the image will be loaded into memory to be flipped. If the image is larger than this size then the coordinates will be flipped instead. |
filter_counts |
Logical, whether to keep cells with counts |
add_molecules |
Logical, whether to add transcripts coordinates to an object. |
use_bboxes |
If no segmentation output is present, use
|
use_cellpose |
Whether to read the parquet files from CellPose cell
segmentation. If |
BPPARAM |
A
|
file_out |
Name of file to save the geometry or raster to disk. Especially when the geometries are so large that it's unwieldy to load everything into memory. If this file (or directory for multiple files) already exists, then the existing file(s) will be read, skipping the processing. When writing the file, extensions supplied are ignored and extensions are determined based on 'dest'. |
z_option |
What to do with z coordinates. "3d" is to construct 3D geometries. "split" is to create a separate 2D geometry for each z-plane so geometric operations are fully supported but some data wrangling is required to perform 3D analyses. When the z coordinates are not integers, 3D geometries will always be constructed since there are no z-planes to speak of. This argument does not apply when 'spatialCoordsNames' has length 2. |
A SpatialFeatureExperiment
object.
Since the transcript spots file is often very large, we recommend only
using add_molecules = TRUE
on servers with a lot of memory. If
reading all z-planes, conversion of transcript spot geometry to parquet
file might fail due to arrow data length limit. In a future version, when
the transcript spot geometry is large, it will be written to multiple
separate parquet files which are then concatenated with DuckDB. Also, in a
future version, the transcript spot processing function might be rewritten
in C++ to stream the original CSV file so it's not entirely loaded into
memory.
fp <- tempfile()
dir_use <- SFEData::VizgenOutput(file_path = fp)
sfe <- readVizgen(dir_use, z = 3L, image = "PolyT",
flip = "geometry")
## Filtering of counts, and addition of molecule coordinates..
sfe <- readVizgen(dir_use, z = 3L, image = "PolyT", filter_counts = TRUE,
add_molecules = TRUE, flip = "geometry")
unlink(dir_use, recursive = TRUE)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.