importSampleData: Import sample metadata

Description Usage Arguments Value bcbio pipeline Note Author(s) Examples

View source: R/importSampleData.R

Description

This function imports user-defined sample metadata saved in a spreadsheet.

Usage

1
2
3
4
5
6
7
importSampleData(
  file,
  sheet = 1L,
  lanes = 0L,
  pipeline = c("none", "bcbio", "cellranger", "cpi"),
  autopadZeros = FALSE
)

Arguments

file

character(1). File path.

sheet

character(1) or integer(1). Workbook sheet.

lanes

integer(1). Number of lanes used to split the samples into technical replicates suffix (i.e. _LXXX).

pipeline

character(1). Analysis pipeline:

  • "none": Simple mode, requiring only "sampleID" column.

  • "bcbio": bcbio mode. See section here in documentation for details.

  • "cellranger": Cell Ranger mode. Currently requires "directory" column. Used by Chromium R package.

autopadZeros

logical(1). Autopad zeros in sample identifiers, for improved sorting. Currently supported only for non-multiplexed samples. For example: sample_1, sample_2, ... sample_10 becomes sample_01, sample_02, ... sample10.

Value

DataFrame.

bcbio pipeline

Required column names. The "description" column is always required, and must match the bcbio per sample directory names exactly. Inclusion of the "fileName" column isn't required but is recommended for data provenance. Note that some bcbio examples on readthedocs use "samplename" (note case) instead of "fileName". This function checks for that and will rename the column to "fileName" automatically. We're using the sampleName column (note case) to define unique sample names, in the event that bcbio has processed multiplexed samples.

Demultiplexed samples. The samples in the bcbio run must map to the "description" column. The values provided in description for demultiplexed samples must be unique. They must also be syntactically valid, meaning that they cannot contain illegal characters (e.g. spaces, non-alphanumerics, dashes) or begin with a number. Consult the documentation in help(topic = "make.names") for more information on valid names in R.

Multiplexed samples. This applies to some single-cell RNA-seq formats, including inDrops. In this case, bcbio will output per-sample directories with this this structure: description-revcomp. readSampleData() checks to see if the "description" column is unique. If the values are duplicated, the function assumes that bcbio processed multiplexed FASTQs, where multiple samples of interest are barcoded inside a single FASTQ. This this case, you must supply additional "index", "sequence", and "sampleName" columns. Note that bcbio currently outputs the reverse complement index sequence in the sample directory names (e.g. "sample-ATAGAGAG"). Define the forward index barcode in the sequence column here, not the reverse complement. The reverse complement will be calculated automatically and added as the revcomp column in the sample metadata.

Note

Works with local or remote files.

Updated 2020-07-24.

Author(s)

Michael Steinbaugh

Examples

1
2
3
4
5
6
7
8
9
## Demultiplexed ====
file <- file.path(basejumpTestsURL, "bcbio-metadata-demultiplexed.csv")
x <- importSampleData(file, pipeline = "bcbio")
print(x)

## Multiplexed ====
file <- file.path(basejumpTestsURL, "bcbio-metadata-multiplexed-indrops.csv")
x <- importSampleData(file, pipeline = "bcbio")
print(x)

acidgenomics/basejump documentation built on Aug. 8, 2020, 2:11 a.m.