findBarcodes: Demultiplex reads by their barcodes
In hiReadsProcessor: Functions to process LM-PCR reads from 454/Illumina data

Description Usage Arguments Value See Also Examples

Given a sample information object, the function reads in the raw fasta/fastq file, demultiplexes reads by their barcodes, and appends it back to the sampleInfo object. Calls splitByBarcode to perform the actual splitting of file by barcode sequences. If supplied with a character vector and reads themselves, the function behaves a bit differently. See the examples.

1
2
3

findBarcodes(sampleInfo, sector = NULL, dnaSet = NULL,
  showStats = FALSE, returnUnmatched = FALSE, dereplicate = FALSE,
  alreadyDecoded = FALSE)

`sampleInfo`	sample information SimpleList object created using `read.sampleInfo`, which holds barcodes and sample names per sector/quadrant/lane or a character vector of barcodes to sample name associations. Ex: c("ACATCCAT"="Sample1", "GAATGGAT"="Sample2",...)
`sector`	If sampleInfo is a SimpleList object, then a numeric/character value or vector representing sector(s) in sampleInfo. Optionally if on high memory machine sector='all' will decode/demultiplex sequences from all sectors/quadrants. This option is ignored if sampleInfo is a character vector. Default is NULL.
`dnaSet`	DNAStringSet object containing sequences to be decoded or demultiplexed. Default is NULL. If sampleInfo is a SimpleList object, then reads are automatically extracted using `read.seqsFromSector` and parameters defined in sampleInfo object.
`showStats`	toggle output of search statistics. Default is FALSE.
`returnUnmatched`	return unmatched sequences. Returns results as a list where x[["unDecodedSeqs"]] has culprits. Default is FALSE.
`dereplicate`	return dereplicated sequences. Calls `dereplicateReads`, which appends counts=X to sequence names/deflines. Default is FALSE. Not applicable for paired end data since it can cause insyncronicity.
`alreadyDecoded`	if reads have be already decoded and split into respective files per sample and 'seqfilePattern' parameter in `read.SeqFolder` is set to reading sample files and not the sector files, then set this to TRUE. Default is FALSE. Enabling this parameter skips the barcode detection step and loads the sequence file as is into the sampleInfo object.

If sampleInfo is an object of SimpleList then decoded sequences are appeneded to respective sample slots, else a named list of DNAStringSet object. If returnUnmatched=TRUE, then x[["unDecodedSeqs"]] has the unmatched sequences.

splitByBarcode, dereplicateReads, replicateReads

dnaSet <- DNAStringSet(c("read1" = "ACATCCATAGAGCTACGACGACATCGACATA",
"read2"="GAATGGATGACGACTACAGCACGACGAGCAGCTACT",
"read3"="GAATGGATGCGCTAAGAAGAGA", "read4"="ACATCCATTCTACACATCT"))
findBarcodes(sampleInfo = c("ACATCCAT" = "Sample1", "GAATGGAT" = "Sample2"),
dnaSet=dnaSet, showStats=TRUE, returnUnmatched=TRUE)
## Not run: 
load(file.path(system.file("data", package = "hiReadsProcessor"),
"FLX_seqProps.RData"))
findBarcodes(seqProps, sector = "all", showStats = TRUE)

## End(Not run)