extractBarcodes: Barcode extraction

View source: R/raw_data_processing.R

extractBarcodesR Documentation

Barcode extraction

Description

Extracts barcodes according to the given barcode design from a fastq file.

Usage

extractBarcodes(
  dat,
  label,
  results_dir = "./",
  mismatch = 0,
  indels = FALSE,
  bc_backbone,
  full_output = FALSE,
  cpus = 1,
  strategy = "sequential",
  wobble_extraction = TRUE,
  dist_measure = "hamming"
)

Arguments

dat

a ShortReadQ object.

label

a character string.

results_dir

a character string which contains the path to the results directory.

mismatch

an positive integer value, default is 0, if greater values are provided they indicate the number of allowed mismatches when identifing the barcode constructe.

indels

under construction.

bc_backbone

a character string or character vector describing the barcode design, variable positions have to be marked with the letter 'N'.

full_output

a logical value. If TRUE additional output files will be generated in order to identify errors.

cpus

an integer value, indicating the number of available cpus.

strategy

since the future package is used for parallelisation a strategy has to be stated, the default is "sequential" (cpus = 1) and "multiprocess" (cpus > 1). For further information please read future::plan() R-Documentation.

wobble_extraction

a logical value. If TRUE, single reads will be stripped of the backbone and only the "wobble" positions will be left.

dist_measure

a character value. If "bc_backbone = 'none'", single reads will be clustered based on a distance measure. Available distance methods are Optimal string aligment ("osa"), Levenshtein ("lv"), Damerau-Levenshtein ("dl"), Hamming ("hamming"), Longest common substring ("lcs"), q-gram ("qgram"), cosine ("cosine"), Jaccard ("jaccard"), Jaro-Winkler ("jw"), distance based on soundex encoding ("soundex"). For more detailed information see stringdist function of the stringdist-package for more information)

Value

one or a list of frequency table(s) of barcode sequences.

Examples


## Not run: 

bc_backbone <- "ACTNNCGANNCTTNNCGANNCTTNNGGANNCTANNACTNNCGANNCTTNNCGANNCTTNNGGANNCTANNACTNNCGANN"
source_dir <- system.file("extdata", package = "genBaRcode")
dat <- ShortRead::readFastq(dirPath = source_dir, pattern = "test_data.fastq.gz")

extractBarcodes(dat, label = "test", results_dir = getwd(), mismatch = 0,
indels = FALSE, bc_backbone)


## End(Not run)

genBaRcode documentation built on March 31, 2023, 11:02 p.m.