runDada16S: Run Dada on Paired-End 16S Sequences (with metadata)

View source: R/dada_wrappers.R

runDada16SR Documentation

Run Dada on Paired-End 16S Sequences (with metadata)

Description

Runs the Dada algorithm to infer sample composition from paired-end 16S fastq files. This implementation is based on Ben Callahan's vignette at https://benjjneb.github.io/dada2/bigdata_paired.html.

Usage

runDada16S(
  fn,
  in_subdir,
  meta,
  out_seqtab = NULL,
  out_track = NULL,
  in_explicitdir = NULL,
  remove_chimeras = TRUE,
  multithread = FALSE,
  verbose = FALSE,
  seed = NULL,
  nbases = 1e+07
)

Arguments

fn

Base names of input fastq files. If inputs are not base names (i.e. if they include directory paths), the directory paths will be removed. Files that do not exist will be ignored; however, if all files do not exist, this function will issue a warning.

in_subdir

Subdirectory name from which to retrieve input fastq files. Enter "raw" for raw sequence files, or any other character string to specify a subdirectory within NEONMICROBE_DIR_MIDPROCESS/16S. To specify a directory outside NEONMICROBE_DIR_MIDPROCESS/16S, use the 'in_explicitdir' argument.

meta

The output of downloadSequenceMetadata. Must be provided as either the data.frame returned by downloadSequenceMetadata or as a filepath to the csv file produced by downloadSequenceMetadata.

out_seqtab, out_track

File locations where copies of the sequence table and read-tracking table (respectively) will be written. By default (NULL), these are NEONMICROBE_DIR_MIDPROCESS/16S/3_seqtabs/asv_16s_timestamp.Rds and NEONMICROBE_DIR_TRACKREADS/16S/dada_16s_timestamp.csv. If no copy should be saved, set the corresponding argument to FALSE.

in_explicitdir

Directory name to use instead of 'in_subdir', if static directory name or directory outside of NEONMICROBE_DIR_MIDPROCESS/16S is desired. Not recommended for use within processing batches.

remove_chimeras

(Optional) Default TRUE. Whether to remove chimeras from the sequence table. Currently only supports removal using removeBimeraDenovo with the consensus method.

multithread

Default FALSE. Whether to use multithreading.

verbose

Default FALSE. Whether to print messages regarding the samples being processed, dimensions of the resulting sequence table, and the distribution of sequence lengths.

seed

(Optional) Integer to use as random seed for reproducibility.

nbases

(Optional) Number of bases to use for learning errors. Default 1e7.

Value

(Invisibly) A list of two elements. seqtab is the sequence table, with chimeras removed if remove_chimeras == TRUE, and track is a data frame displaying the number of reads remaining for each sample at various points throughout the processing pipeline.

Examples

 {
fl_nm <- c("BMI_Plate37WellA12_16S_BJ8RK_R1.fastq.gz", "BMI_Plate37WellA12_16S_BJ8RK_R2.fastq.gz")
dada_out <- runDada16S(
  fl_nm, in_subdir = "2_filtered", meta = seqmeta_greatplains_16s,
  verbose = FALSE,
  multithread = TRUE # set multithread = FALSE on Windows computers though
)

dada_out$track # read-tracking table
data_out$seqtab # ASV abundance table

claraqin/neonMicrobe documentation built on April 11, 2024, 11:47 a.m.