Description Usage Arguments Details Value See Also Examples
View source: R/hiReadsProcessor.R
Given a sample information file, the function checks if it includes required information to process samples present on each sector/quadrant/region/lane. The function also adds other columns required for processing with default values if not already defined ahead of time.
1 | read.sampleInfo(sampleInfoPath = NULL, splitBySector = TRUE)
|
sampleInfoPath |
full or relative path to the sample information file, which holds samples to quadrant/lane associations along with other metadata required to trim sequences or process it. |
splitBySector |
split the data frame into a list by sector column. Default is TRUE. |
Required Column Description:
sector => region/quadrant/lane of the sequencing plate the sample
comes from. If files have been split by samples apriori, then the filename
associated per sample without the extension. If this is a filename, then
be sure to enable 'alreadyDecoded' parameter in findBarcodes
,
since contents of this column is pasted together with 'seqfilePattern'
parameter in read.SeqFolder
to find the appropriate file
needed. For paired end data, this is basename of the FASTA/Q file holding
the sample data from the LTR side. For example, files such as
Lib3_L001_R2_001.fastq.gz or Lib3_L001_R2_001.fastq would be
Lib3_L001_R2_001, and consequently Lib3_L001_R1_001 would be used as the
second pair!
barcode => unique 4-12bp DNA sequence which identifies the sample. If providing filename as sector, then leave this blank since it is assumed that the data is already demultiplexed.
primerltrsequence => DNA sequence of the viral LTR primer with/without the viral LTR sequence following the primer landing site. If already trimmed, then mark this as SKIP.
sampleName => Name of the sample associated with the barcode
sampleDescription => Detailed description of the sample
gender => sex of the sample: male or female or NA
species => species of the sample: homo sapien, mus musculus, etc.
freeze => UCSC freeze to which the sample should be aligned to.
linkerSequence => DNA sequence of the linker adaptor following the genomic sequence. If already trimmed, then mark this as SKIP.
restrictionEnzyme => Restriction enzyme used for digestion and sample recovery. Can also be one of: Fragmentase or Sonication!
Metadata Parameter Column Description:
ltrBitSequence => DNA sequence of the viral LTR following the primer landing site. Default is last 7bps of the primerltrsequence.
ltrBitIdentity => percent of LTR bit sequence to match during the alignment. Default is 1.
primerLTRidentity => percent of primer to match during the alignment. Default is .85
linkerIdentity => percent of linker sequence to match during the alignment. Default is 0.55. Only applies to non-primerID/random tag based linker search.
primerIdInLinker => whether the linker adaptor used has primerID/random tag in it? Default is FALSE.
primerIdInLinkerIdentity1 => percent of sequence to match before the random tag. Default is 0.75. Only applies to primerID/random tag based linker search and when primeridinlinker is TRUE.
primerIdInLinkerIdentity2 => percent of sequence to match after the random tag. Default is 0.50. Only applies to primerID/random tag based linker search and when primeridinlinker is TRUE.
celltype => celltype information associated with the sample
user => name of the user who prepared or processed the sample
pairedEnd => is the data paired end? Default is FALSE.
vectorFile => fasta file containing the vector sequence
Processing Parameter Column Description:
startWithin => upper bound limit of where the alignment should start within the query. Default is 3.
alignRatioThreshold => cuttoff for (alignment span/read length). Default is 0.7.
genomicPercentIdentity => cuttoff for (1-(misMatches/matches)). Default is 0.98.
clusterSitesWithin => cluster integration sites within a defined window size based on frequency which corrects for any sequencing errors. Default is 5.
keepMultiHits => whether to keep sequences/reads that return multiple best hits, aka ambiguous locations.
processingDate => the date of processing
if splitBySector=TRUE, then an object of SimpleList named by quadrant/lane information defined in sampleInfo file, else a dataframe.
read.SeqFolder
, findBarcodes
,
splitByBarcode
1 2 3 4 |
runData <- system.file(file.path("extdata", "FLX_sample_run"),
package = "hiReadsProcessor")
read.sampleInfo(file.path(runData, "sampleInfo.xlsx"))
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.