View source: R/formatTxSpots.R
formatTxSpots | R Documentation |
The function 'formatTxSpots' reads the transcript spot coordinates of smFISH-based data and formats the data. The data is not added to an SFE object. If the file specified in 'file_out' already exists, then this file will be read instead of the original file in the 'file' argument, so the processing is not run multiple times. The function 'addTxSpots' adds the data read and processed in 'formatTxSpots' to the SFE object, and reads all transcript spot data. To only read a subset of transcript spot data, first use 'formatTxSpots' to write the re-formatted data to disk. Then read the specific subset and add them separately to the SFE object with the setter functions.
formatTxSpots(
file,
dest = c("rowGeometry", "colGeometry"),
spatialCoordsNames = c("global_x", "global_y", "global_z"),
gene_col = "gene",
cell_col = "cell_id",
z = "all",
phred_col = "qv",
min_phred = 20,
split_col = NULL,
not_in_cell_id = c("-1", "UNASSIGNED"),
z_option = c("3d", "split"),
flip = FALSE,
file_out = NULL,
BPPARAM = SerialParam(),
return = TRUE
)
addTxSpots(
sfe,
file,
sample_id = 1L,
spatialCoordsNames = c("global_x", "global_y", "global_z"),
gene_col = "gene",
z = "all",
phred_col = "qv",
min_phred = 20,
split_col = NULL,
z_option = c("3d", "split"),
flip = FALSE,
file_out = NULL,
BPPARAM = SerialParam()
)
file |
File with the transcript spot coordinates. Should be one row per spot when read into R and should have columns for coordinates on each axis, gene the transcript is assigned to, and optionally cell the transcript is assigned to. Must be csv, tsv, or parquet. |
dest |
Where in the SFE object to store the spot geometries. This affects how the data is processed. Options:
|
spatialCoordsNames |
Column names for the x, y, and optionally z coordinates of the spots. The defaults are for Vizgen. |
gene_col |
Column name for genes. |
cell_col |
Column name for cell IDs, ignored if 'dest = "rowGeometry"'. Can have length > 1 when multiple columns are needed to uniquely identify cells, in which case the contents of the columns will be concatenated, such as in CosMX data where cell ID is only unique within the same FOV. Default "cell_id" is for Vizgen MERFISH. Should be 'c("cell_ID", "fov")' for CosMX. |
z |
Index of z plane to read. Can be "all" to read all z-planes into MULTIPOINT geometries with XYZ coordinates. If z values are not integer, then spots with all z values will be read. |
phred_col |
Column name for Phred scores of the spots. |
min_phred |
Minimum Phred score to keep spot. By default 20, the conventional threshold indicating "acceptable", meaning that there's 1 chance that the spot was decoded in error. |
split_col |
Categorical column to split the geometries, such as cell compartment the spots are assigned to as in the "CellComp" column in CosMX output. |
not_in_cell_id |
Value of cell ID indicating that the spot is not assigned to any cell, such as "-1" in Vizgen MERFISH and "0" in CosMX. When there're multiple columns for 'cell_col', the first column is used to identify spots that are not in cells. |
z_option |
What to do with z coordinates. "3d" is to construct 3D geometries. "split" is to create a separate 2D geometry for each z-plane so geometric operations are fully supported but some data wrangling is required to perform 3D analyses. When the z coordinates are not integers, 3D geometries will always be constructed since there are no z-planes to speak of. This argument does not apply when 'spatialCoordsNames' has length 2. |
flip |
Logical, whether to flip the geometry to match image. Here the y
coordinates are simply set to -y, so the original bounding box is not
preserved. This is consistent with |
file_out |
Name of file to save the geometry or raster to disk. Especially when the geometries are so large that it's unwieldy to load everything into memory. If this file (or directory for multiple files) already exists, then the existing file(s) will be read, skipping the processing. When writing the file, extensions supplied are ignored and extensions are determined based on 'dest'. |
BPPARAM |
|
return |
Logical, whether to return the geometries in memory. This does not depend on whether the geometries are written to file. Always 'FALSE' when 'dest = "colGeometry"'. |
sfe |
A 'SpatialFeatureExperiment' object. |
sample_id |
Which sample in the SFE object the transcript spots should be added to. |
A sf data frame for vector geometries if 'file_out' is not set. 'SpatRaster' for raster. If there are multiple files written, such as when splitting by cell compartment or when 'dest = "colGeometry"', then a directory with the same name as 'file_out' will be created (but without the extension) and the files are written to that directory with informative names. 'parquet' files that can be read with 'st_read' is written for vector geometries. When 'return = FALSE', the file name or directory (when there're multiple files) is returned.
The 'sf' data frame, or path to file where geometries are written if 'return = FALSE'.
When 'dest = "colGeometry"', the geometries are always written to disk and not returned in memory, because this is essentially the gene count matrix, which is sparse. This kind of reformatting is implemented so users can read in MULTIPOINT geometries with transcript spots for each gene assigned to each cell for spatial point process analyses, where not all genes are loaded at once.
# Default arguments are for MERFISH
fp <- tempdir()
dir_use <- SFEData::VizgenOutput(file_path = file.path(fp, "vizgen_test"))
g <- formatTxSpots(file.path(dir_use, "detected_transcripts.csv"))
unlink(dir_use, recursive = TRUE)
# For CosMX, note the colnames, also dest = "colGeometry"
# Results are written to the tx_spots directory
dir_use <- SFEData::CosMXOutput(file_path = file.path(fp, "cosmx_test"))
cg <- formatTxSpots(file.path(dir_use, "Run5642_S3_Quarter_tx_file.csv"),
dest = "colGeometry", z = "all",
cell_col = c("cell_ID", "fov"),
gene_col = "target", not_in_cell_id = "0",
spatialCoordsNames = c("x_global_px", "y_global_px", "z"),
file_out = file.path(dir_use, "tx_spots"))
# Cleanup
unlink(dir_use, recursive = TRUE)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.