import_geomx_samplesheet: Import GeoMx SampleSheet.csv data

import_geomx_samplesheetR Documentation

Import GeoMx SampleSheet.csv data

Description

Import GeoMx SampleSheet.csv data

Usage

import_geomx_samplesheet(
  x = "SampleSheet.csv",
  return_type = c("dflist", "list", "indices", "filenames"),
  do_revcomp = TRUE,
  demux_sheetnumber = "S1",
  demux_prefix = "BK-Gq-1_hdist1_",
  ...
)

Arguments

x

character path to SampleSheet.csv formatted GeoMx file.

return_type

character string to define the return type:

  • "dflist" - list of data.frame for each heading (default)

  • "list" - list of lines as-is.

  • "indices" - character vector of expected indices, in format "Index+Index2".

  • "filenames" - data.frame describing the expected output filenames after running demuxbyname.sh (BBTools), and the expected GeoMx filename using the Sample_ID, suitable to rename one file to the other.

do_revcomp

logical indicating whether to use reverse complement for Index2, default=TRUE.

demux_sheetnumber

character used when return_type="filenames" in formulating the correct GeoMx input filename for the NGS pipeline. This value therefore affects the filename after renaming the demux file, to become the NGS input filename.

demux_prefix

character prefix assigned when running demuxbyname.sh (BBTools). It is usually in the form "project_hist1" where "hist1" refers to the Hamming distance threshold "hdist" used with demuxbyname.sh. This prefix therefore is used to match the filename produced by demuxbyname.sh, which is then renamed to the NGS filename.

...

additional arguments are ignored.

Details

This function can import "SampleSheet.csv" and "GNP_config.ini" files, which are characterized as follows:

  • Each subset of data is preceded by a header line: ⁠[header_name]⁠

  • Data following this line is comma-delimited, or delimited with " = ", both of which are treated as equivalent.

  • There is a blank line between the subset of data and the next header.

Some rules with return_type="dflist":

  • Delimiters are recognized as " = " or ",", but because the import process calls data.table::fread() it will probably also accept tab-delimited data.

  • When the first row following the header appears to have column names, they are used as-is as column names.

  • The following criteria cause the first row NOT to be used as column header:

    • The first entry does not contain "Sample_ID", and any of:

    • Any value following a comma begins with a number, or

    • Any value following a comma is "true" or "false", or

    • Any value following a comma is purely DNA sequence "[ATGC]+", or

    • There is no "," delimiter, or

    • There is only one value in the subset of data.

Therefore, when the first entry begins with "Sample_ID" the first entry is used as column header.

  • When the first entry is not used as column headers, the heading name itself is used as the first column name, followed by V concatenated to the integer column number, for example: ⁠"Sequencing", "V2", "V3", "V4"⁠

Todo:

  • Add return_type option to create commands to rename demux output files to the expected GeoMx Sample_ID format.

Value

list or character vector, consistent with return_type.

See Also

Other jam GeoMx functions: revcomp()

Examples

samplefile <- system.file("data", "SampleSheet.csv", package="platjam")
samplelist <- import_geomx_samplesheet(samplefile)
lengths(samplelist)


jmw86069/platjam documentation built on April 12, 2025, 1:41 p.m.