Given a sample information file, the function checks if it includes required information to process samples present on each sector/quadrant/region/lane. The function also adds other columns required for processing with default values if not already defined ahead of time.
full or relative path to the sample information file, which holds samples to quadrant/lane associations along with other metadata required to trim sequences or process it.
split the data frame into a list by sector column. Default is TRUE.
Required Column Description:
sector => region/quadrant/lane of the sequencing plate the sample
comes from. If files have been split by samples apriori, then the filename
associated per sample without the extension. If this is a filename, then
be sure to enable 'alreadyDecoded' parameter in
since contents of this column is pasted together with 'seqfilePattern'
read.SeqFolder to find the appropriate file
needed. For paired end data, this is basename of the FASTA/Q file holding
the sample data from the LTR side. For example, files such as
Lib3_L001_R2_001.fastq.gz or Lib3_L001_R2_001.fastq would be
Lib3_L001_R2_001, and consequently Lib3_L001_R1_001 would be used as the
barcode => unique 4-12bp DNA sequence which identifies the sample. If providing filename as sector, then leave this blank since it is assumed that the data is already demultiplexed.
primerltrsequence => DNA sequence of the viral LTR primer with/without the viral LTR sequence following the primer landing site. If already trimmed, then mark this as SKIP.
sampleName => Name of the sample associated with the barcode
sampleDescription => Detailed description of the sample
gender => sex of the sample: male or female or NA
species => species of the sample: homo sapien, mus musculus, etc.
freeze => UCSC freeze to which the sample should be aligned to.
linkerSequence => DNA sequence of the linker adaptor following the genomic sequence. If already trimmed, then mark this as SKIP.
restrictionEnzyme => Restriction enzyme used for digestion and sample recovery. Can also be one of: Fragmentase or Sonication!
Metadata Parameter Column Description:
ltrBitSequence => DNA sequence of the viral LTR following the primer landing site. Default is last 7bps of the primerltrsequence.
ltrBitIdentity => percent of LTR bit sequence to match during the alignment. Default is 1.
primerLTRidentity => percent of primer to match during the alignment. Default is .85
linkerIdentity => percent of linker sequence to match during the alignment. Default is 0.55. Only applies to non-primerID/random tag based linker search.
primerIdInLinker => whether the linker adaptor used has primerID/random tag in it? Default is FALSE.
primerIdInLinkerIdentity1 => percent of sequence to match before the random tag. Default is 0.75. Only applies to primerID/random tag based linker search and when primeridinlinker is TRUE.
primerIdInLinkerIdentity2 => percent of sequence to match after the random tag. Default is 0.50. Only applies to primerID/random tag based linker search and when primeridinlinker is TRUE.
celltype => celltype information associated with the sample
user => name of the user who prepared or processed the sample
pairedEnd => is the data paired end? Default is FALSE.
vectorFile => fasta file containing the vector sequence
Processing Parameter Column Description:
startWithin => upper bound limit of where the alignment should start within the query. Default is 3.
alignRatioThreshold => cuttoff for (alignment span/read length). Default is 0.7.
genomicPercentIdentity => cuttoff for (1-(misMatches/matches)). Default is 0.98.
clusterSitesWithin => cluster integration sites within a defined window size based on frequency which corrects for any sequencing errors. Default is 5.
keepMultiHits => whether to keep sequences/reads that return multiple best hits, aka ambiguous locations.
processingDate => the date of processing
if splitBySector=TRUE, then an object of SimpleList named by quadrant/lane information defined in sampleInfo file, else a dataframe.
1 2 3 4
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.