SangerContig-class: SangerContig
In roblanf/sangeranalyseR: sangeranalyseR: a suite of functions for the analysis of Sanger sequence data in R

SangerContig-class

R Documentation

SangerContig

Description

An S4 class containing forward and reverse SangerRead lists and alignment, consensus read results which corresponds to a contig in Sanger sequencing.

Slots

objectResults: This is the object that stores all information of the creation result.
inputSource: The input source of the raw file. It must be "ABIF" or "FASTA". The default value is "ABIF".
processMethod: The method to create a contig from reads. The value is "REGEX" or "CSV". The default value is "REGEX".
ABIF_Directory: If inputSource is "ABIF", then this value is the path of a parent directory storing all reads in ABIF format you want to analyse. If inputSource is "FASTA", then this value has to be NULL by default.
FASTA_File: If inputSource is "FASTA", then this value has to be the path to a valid FASTA file ; if inputSource is "ABIF", then this value has to be NULL by default.
REGEX_SuffixForward: The suffix of the filenames for forward reads in regular expression, i.e. reads that do not need to be reverse-complemented.
REGEX_SuffixReverse: The suffix of the filenames for reverse reads in regular expression, i.e. reads that need to be reverse-complemented.
CSV_NamesConversion: The file path to the CSV file that provides read names, directions, and their contig groups. If processMethod is "CSV", then this value has to be the path to a valid CSV file; if processMethod is "REGEX", then this value has to be NULL by default.
contigName: The contig name of all the reads in ABIF_Directory.
geneticCode: Named character vector in the same format as GENETIC_CODE (the default), which represents the standard genetic code. This is the code with which the function will attempt to translate your DNA sequences. You can get an appropriate vector with the getGeneticCode() function. The default is the standard code.
forwardReadList: The list of SangerRead S4 instances which are all forward reads.
reverseReadList: The list of SangerRead S4 instances which are all reverse reads.
minReadsNum: The minimum number of reads required to make a consensus sequence, must be 2 or more. The default value is 2.
minReadLength: Reads shorter than this will not be included in the readset. The default 20 means that all reads with length of 20 or more will be included. Note that this is the length of a read after it has been trimmed.
refAminoAcidSeq: An amino acid reference sequence supplied as a string or an AAString object. If your sequences are protein-coding DNA seuqences, and you want to have frameshifts automatically detected and corrected, supply a reference amino acid sequence via this argument. If this argument is supplied, the sequences are then kept in frame for the alignment step. Fwd sequences are assumed to come from the sense (i.e. coding, or "+") strand. The default value is "".
minFractionCall: Minimum fraction of the sequences required to call a consensus sequence for SangerContig at any given position (see the ConsensusSequence() function from DECIPHER for more information). Defaults to 0.75 implying that 3/4 of all reads must be present in order to call a consensus.
maxFractionLost: Numeric giving the maximum fraction of sequence information that can be lost in the consensus sequence for SangerContig (see the ConsensusSequence() function from DECIPHER for more information). Defaults to 0.5, implying that each consensus base can ignore at most 50 percent of the information at a given position.
acceptStopCodons: The logical value TRUE or FALSE. TRUE (the defualt): keep all reads, regardless of whether they have stop codons; FALSE: reject reads with stop codons. If FALSE is selected, then the number of stop codons is calculated after attempting to correct frameshift mutations (if applicable).
readingFrame: 1, 2, or 3. Only used if accept.stop.codons == FALSE. This specifies the reading frame that is used to determine stop codons. If you use a refAminoAcidSeq, then the frame should always be 1, since all reads will be shifted to frame 1 during frameshift correction. Otherwise, you should select the appropriate reading frame.
contigSeq: The consensus read of all SangerRead S4 instances in DNAString object.
alignment: The alignment of all SangerRead S4 instances with the called consensus sequence in DNAStringSet object. Users can use BrowseSeqs() to view the alignment.
differencesDF: A data frame of the number of pairwise differences between each read and the consensus sequence, as well as the number of bases in each input read that did not contribute to the consensus sequence. It can assist in detecting incorrect reads, or reads with a lot of errors.
distanceMatrix: A distance matrix of genetic distances (corrected with the JC model) between all of the input reads.
dendrogram: A list storing cluster groups in a data frame and a dendrogram object depicting the distance.matrix. Users can use plot() to see the dendrogram.
indelsDF: If users specified a reference sequence via refAminoAcidSeq, then this will be a data frame describing the number of indels and deletions that were made to each of the input reads in order to correct frameshift mutations.
stopCodonsDF: If users specified a reference sequence via refAminoAcidSeq, then this will be a data frame describing the number of stop codons in each read.
secondaryPeakDF: A data frame with one row for each column in the alignment that contained more than one secondary peak. The data frame has three columns: the column number of the alignment; the number of secondary peaks in that column; and the bases (with IUPAC ambiguity codes representing secondary peak calls) in that column represented as a string.

Author(s)

Kuan-Hao Chao

Examples

## Simple example
rawDataDir <- system.file("extdata", package = "sangeranalyseR")
parentDir <- file.path(rawDataDir, "Allolobophora_chlorotica", "RBNII")
contigName <- "Achl_RBNII384-13"
REGEX_SuffixForward <- "_[0-9]*_F.ab1$"
REGEX_SuffixReverse <- "_[0-9]*_R.ab1$"
sangerContig <- new("SangerContig",
                     ABIF_Directory       = parentDir,
                     contigName            = contigName,
                     REGEX_SuffixForward   = REGEX_SuffixForward,
                     REGEX_SuffixReverse   = REGEX_SuffixReverse)
                     
## forward / reverse reads match error
## Input From ABIF file format (Regex)
rawDataDir <- system.file("extdata", package = "sangeranalyseR")
parentDir <- file.path(rawDataDir, "Allolobophora_chlorotica", "ACHLO")
contigName <- "Achl_ACHLO006-09"
REGEX_SuffixForward <- "_[0-9]*_F.ab1$"
REGEX_SuffixReverse <- "_[0-9]*_R.ab1$"
sangerContig <- new("SangerContig",
                     inputSource           = "ABIF",
                     processMethod         = "REGEX",
                     ABIF_Directory       = parentDir,
                     contigName            = contigName,
                     REGEX_SuffixForward   = REGEX_SuffixForward,
                     REGEX_SuffixReverse   = REGEX_SuffixReverse,
                     refAminoAcidSeq = "SRQWLFSTNHKDIGTLYFIFGAWAGMVGTSLSILIRAELGHPGALIGDDQIYNVIVTAHAFIMIFFMVMPIMIGGFGNWLVPLMLGAPDMAFPRMNNMSFWLLPPALSLLLVSSMVENGAGTGWTVYPPLSAGIAHGGASVDLAIFSLHLAGISSILGAVNFITTVINMRSTGISLDRMPLFVWSVVITALLLLLSLPVLAGAITMLLTDRNLNTSFFDPAGGGDPILYQHLFWFFGHPEVYILILPGFGMISHIISQESGKKETFGSLGMIYAMLAIGLLGFIVWAHHMFTVGMDVDTRAYFTSATMIIAVPTGIKIFSWLATLHGTQLSYSPAILWALGFVFLFTVGGLTGVVLANSSVDIILHDTYYVVAHFHYVLSMGAVFAIMAGFIHWYPLFTGLTLNNKWLKSHFIIMFIGVNLTFFPQHFLGLAGMPRRYSDYPDAYTTWNIVSTIGSTISLLGILFFFFIIWESLVSQRQVIYPIQLNSSIEWYQNTPPAEHSYSELPLLTN",
                     TrimmingMethod        = "M1",
                     M1TrimmingCutoff      = 0.0001,
                     baseNumPerRow         = 100,
                     heightPerRow          = 200,
                     signalRatioCutoff     = 0.33,
                     showTrimmed           = TRUE,
                     minReadsNum           = 2,
                     processorsNum         = 2)

## Input From ABIF file format (Csv three column method)
rawDataDir <- system.file("extdata", package = "sangeranalyseR")
parentDir <- file.path(rawDataDir, "Allolobophora_chlorotica", "RBNII")
CSV_NamesConversion <- file.path(rawDataDir, "ab1", "SangerContig", "names_conversion_2.csv")
sangerContig <- new("SangerContig",
                     inputSource           = "ABIF",
                     processMethod         = "CSV",
                     ABIF_Directory        = parentDir,
                     CSV_NamesConversion   = CSV_NamesConversion,
                     contigName            = "Achl_RBNII384-13",
                     refAminoAcidSeq = "SRQWLFSTNHKDIGTLYFIFGAWAGMVGTSLSILIRAELGHPGALIGDDQIYNVIVTAHAFIMIFFMVMPIMIGGFGNWLVPLMLGAPDMAFPRMNNMSFWLLPPALSLLLVSSMVENGAGTGWTVYPPLSAGIAHGGASVDLAIFSLHLAGISSILGAVNFITTVINMRSTGISLDRMPLFVWSVVITALLLLLSLPVLAGAITMLLTDRNLNTSFFDPAGGGDPILYQHLFWFFGHPEVYILILPGFGMISHIISQESGKKETFGSLGMIYAMLAIGLLGFIVWAHHMFTVGMDVDTRAYFTSATMIIAVPTGIKIFSWLATLHGTQLSYSPAILWALGFVFLFTVGGLTGVVLANSSVDIILHDTYYVVAHFHYVLSMGAVFAIMAGFIHWYPLFTGLTLNNKWLKSHFIIMFIGVNLTFFPQHFLGLAGMPRRYSDYPDAYTTWNIVSTIGSTISLLGILFFFFIIWESLVSQRQVIYPIQLNSSIEWYQNTPPAEHSYSELPLLTN",
                     TrimmingMethod        = "M1",
                     M1TrimmingCutoff      = 0.000001,
                     baseNumPerRow         = 100,
                     heightPerRow          = 200,
                     signalRatioCutoff     = 0.33,
                     showTrimmed           = TRUE,
                     processorsNum         = 2)


## Input From FASTA file format (Regex)
rawDataDir <- system.file("extdata", package = "sangeranalyseR")
fastaFN <- file.path(rawDataDir, "fasta",
                     "SangerContig", "Achl_ACHLO006-09.fa")
contigName <- "Achl_ACHLO006-09"
REGEX_SuffixForwardFa <- "_[0-9]*_F$"
REGEX_SuffixReverseFa <- "_[0-9]*_R$"
sangerContigFa <- new("SangerContig",
                      inputSource           = "FASTA",
                      processMethod         = "REGEX",
                      FASTA_File         = fastaFN,
                      contigName            = contigName,
                      REGEX_SuffixForward   = REGEX_SuffixForwardFa,
                      REGEX_SuffixReverse   = REGEX_SuffixReverseFa,
                      refAminoAcidSeq       = "SRQWLFSTNHKDIGTLYFIFGAWAGMVGTSLSILIRAELGHPGALIGDDQIYNVIVTAHAFIMIFFMVMPIMIGGFGNWLVPLMLGAPDMAFPRMNNMSFWLLPPALSLLLVSSMVENGAGTGWTVYPPLSAGIAHGGASVDLAIFSLHLAGISSILGAVNFITTVINMRSTGISLDRMPLFVWSVVITALLLLLSLPVLAGAITMLLTDRNLNTSFFDPAGGGDPILYQHLFWFFGHPEVYILILPGFGMISHIISQESGKKETFGSLGMIYAMLAIGLLGFIVWAHHMFTVGMDVDTRAYFTSATMIIAVPTGIKIFSWLATLHGTQLSYSPAILWALGFVFLFTVGGLTGVVLANSSVDIILHDTYYVVAHFHYVLSMGAVFAIMAGFIHWYPLFTGLTLNNKWLKSHFIIMFIGVNLTFFPQHFLGLAGMPRRYSDYPDAYTTWNIVSTIGSTISLLGILFFFFIIWESLVSQRQVIYPIQLNSSIEWYQNTPPAEHSYSELPLLTN",
                      processorsNum         = 2)

## Input From FASTA file format (Csv - Csv three column method)
rawDataDir <- system.file("extdata", package = "sangeranalyseR")
fastaFN <- file.path(rawDataDir, "fasta",
                     "SangerContig", "Achl_ACHLO006-09.fa")
CSV_NamesConversion <- file.path(rawDataDir, "fasta", "SangerContig", "names_conversion_1.csv")
sangerContigFa <- new("SangerContig",
                      inputSource           = "FASTA",
                      processMethod         = "CSV",
                      FASTA_File         = fastaFN,
                      CSV_NamesConversion    = CSV_NamesConversion,
                      contigName            = "Achl_ACHLO006-09",
                      refAminoAcidSeq       = "SRQWLFSTNHKDIGTLYFIFGAWAGMVGTSLSILIRAELGHPGALIGDDQIYNVIVTAHAFIMIFFMVMPIMIGGFGNWLVPLMLGAPDMAFPRMNNMSFWLLPPALSLLLVSSMVENGAGTGWTVYPPLSAGIAHGGASVDLAIFSLHLAGISSILGAVNFITTVINMRSTGISLDRMPLFVWSVVITALLLLLSLPVLAGAITMLLTDRNLNTSFFDPAGGGDPILYQHLFWFFGHPEVYILILPGFGMISHIISQESGKKETFGSLGMIYAMLAIGLLGFIVWAHHMFTVGMDVDTRAYFTSATMIIAVPTGIKIFSWLATLHGTQLSYSPAILWALGFVFLFTVGGLTGVVLANSSVDIILHDTYYVVAHFHYVLSMGAVFAIMAGFIHWYPLFTGLTLNNKWLKSHFIIMFIGVNLTFFPQHFLGLAGMPRRYSDYPDAYTTWNIVSTIGSTISLLGILFFFFIIWESLVSQRQVIYPIQLNSSIEWYQNTPPAEHSYSELPLLTN",
                      processorsNum         = 2)

roblanf/sangeranalyseR documentation built on April 15, 2024, 12:44 a.m.

roblanf/sangeranalyseR index

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

roblanf/sangeranalyseR
sangeranalyseR: a suite of functions for the analysis of Sanger sequence data in R

SangerContig-class: SangerContig
In roblanf/sangeranalyseR: sangeranalyseR: a suite of functions for the analysis of Sanger sequence data in R

SangerContig

Description

Slots

Author(s)

Examples

Related to SangerContig-class in roblanf/sangeranalyseR...

R Package Documentation

Browse R Packages

We want your feedback!

roblanf/sangeranalyseR sangeranalyseR: a suite of functions for the analysis of Sanger sequence data in R

SangerContig-class: SangerContig In roblanf/sangeranalyseR: sangeranalyseR: a suite of functions for the analysis of Sanger sequence data in R

SangerContig

Description

Slots

Author(s)

Examples

Related to SangerContig-class in roblanf/sangeranalyseR...

R Package Documentation

Browse R Packages

We want your feedback!

roblanf/sangeranalyseR
sangeranalyseR: a suite of functions for the analysis of Sanger sequence data in R

SangerContig-class: SangerContig
In roblanf/sangeranalyseR: sangeranalyseR: a suite of functions for the analysis of Sanger sequence data in R