FindSynteny: Finds Synteny in a Sequence Database
In DECIPHER: Tools for curating, analyzing, and manipulating biological sequences

Description Usage Arguments Details Value Note Author(s) See Also Examples

Finds syntenic blocks between groups of sequences in a database.

FindSynteny(dbFile,
            tblName = "Seqs",
            identifier = "",
            useFrames = TRUE,
            alphabet = AA_REDUCED[[1]],
            geneticCode = GENETIC_CODE,
            sepCost = 0,
            gapCost = -0.01,
            shiftCost = 0,
            codingCost = 0,
            maxSep = 2000,
            maxGap = 5000,
            minScore = 30,
            dropScore = -100,
            maskRepeats = TRUE,
            allowOverlap = FALSE,
            storage = 0.5,
            processors = 1,
            verbose = TRUE)

`dbFile`	A SQLite connection object or a character string specifying the path to the database file.
`tblName`	Character string specifying the table where the sequences are located.
`identifier`	Optional character string used to narrow the search results to those matching a specific identifier. If "" then all identifiers are selected.
`useFrames`	Logical specifying whether to use 6-frame amino acid translations to help find more distant hits. Using the `alphabet` is helpful when the genome is largely composed of coding DNA. If `FALSE` then faster but less sensitive to distant homology.
`alphabet`	Character vector of amino acid groupings used to reduce the 20 standard amino acids into smaller groups. Alphabet reduction helps to find more distant homologies between sequences. A non-reduced amino acid alphabet can be used by setting `alphabet` equal to `AA_STANDARD`.
`geneticCode`	Either a character vector giving the genetic code to use in translation, or a list containing one genetic code for each identifier. If a list is provided then it must be named by the corresponding identifiers in the database.
`sepCost`	Cost per nucleotide separation between hits to apply when chaining hits into blocks.
`gapCost`	Cost for gaps between hits to apply when chaining hits into blocks.
`shiftCost`	Cost for shifting between different reading frames when chaining reduced amino acid hits into blocks.
`codingCost`	Cost for switching between coding and non-coding hits when chaining hits into blocks.
`maxSep`	Maximal separation (in nucleotides) between hits in the same block.
`maxGap`	The maximum number of gaps between hits in the same block.
`minScore`	The minimum score required for a chain of hits to become a block. Higher values of `minScore` are less likely to yield false positives.
`dropScore`	The change from maximal score required to stop extending blocks.
`maskRepeats`	Logical specifying whether to “soft” mask repeats when searching for hits.
`allowOverlap`	Logical specifying whether to permit blocks to overlap on the same sequence.
`storage`	Excess gigabytes available to store objects so that they do not need to be recomputed in later steps. This should be a number between zero and a (modest) fraction of the available system memory. Note that more than `storage` gigabytes may be required, but will not be stored for later reuse.
`processors`	The number of processors to use, or `NULL` to automatically detect and use all available processors.
`verbose`	Logical indicating whether to display progress.

Long nucleotide sequences, such as genomes, are often not collinear or may be composed of many smaller segments (e.g., contigs). FindSynteny searches for “hits” between sequences that can be chained into collinear “blocks” of synteny. Hits are defined as k-mer exact nucleotide matches or k-mer matches in a reduced amino acid alphabet (if useFrames is TRUE). Hits are chained into blocks as long as they are: (1) within the same sequence, (2) within maxSep and maxGap distance, and (3) help maintain the score above minScore. Blocks are extended from their first and last hit until their score drops below dropScore from the maximum that was reached. This process results in a set of hits and blocks stored in an object of class “Synteny”.

An object of class “Synteny”.

FindSynteny is intended to be used on sets of sequences with up to ~100 million nucleotides total per identifier. For this reason, better performance can sometimes be achieved by assigning a unique identifier to each chromosome belonging to a large genome.

Erik Wright eswright@pitt.edu

AlignSynteny, Synteny-class

db <- system.file("extdata", "Influenza.sqlite", package="DECIPHER")
synteny <- FindSynteny(db)
synteny
pairs(synteny) # scatterplot matrix

Loading required package: Biostrings
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, sd, var, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, basename, cbind, colMeans, colSums, colnames,
    dirname, do.call, duplicated, eval, evalq, get, grep, grepl,
    intersect, is.unsorted, lapply, lengths, mapply, match, mget,
    order, paste, pmax, pmax.int, pmin, pmin.int, rank, rbind,
    rowMeans, rowSums, rownames, sapply, setdiff, sort, table, tapply,
    union, unique, unsplit, which, which.max, which.min

Loading required package: S4Vectors
Loading required package: stats4

Attaching package: 'S4Vectors'

The following object is masked from 'package:base':

    expand.grid

Loading required package: IRanges
Loading required package: XVector

Attaching package: 'Biostrings'

The following object is masked from 'package:base':

    strsplit

Loading required package: RSQLite

  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |====                                                                  |   5%
  |                                                                            
  |=======                                                               |  10%
  |                                                                            
  |==========                                                            |  15%
  |                                                                            
  |==============                                                        |  20%
  |                                                                            
  |==================                                                    |  25%
  |                                                                            
  |=====================                                                 |  30%
  |                                                                            
  |========================                                              |  35%
  |                                                                            
  |============================                                          |  40%
  |                                                                            
  |================================                                      |  45%
  |                                                                            
  |===================================                                   |  50%
  |                                                                            
  |======================================                                |  55%
  |                                                                            
  |==========================================                            |  60%
  |                                                                            
  |==============================================                        |  65%
  |                                                                            
  |=================================================                     |  70%
  |                                                                            
  |====================================================                  |  75%
  |                                                                            
  |========================================================              |  80%
  |                                                                            
  |============================================================          |  85%
  |                                                                            
  |===============================================================       |  90%
  |                                                                            
  |==================================================================    |  95%
  |                                                                            
  |======================================================================| 100%

Time difference of 0.72 secs

         H9N2     H5N1     H2N2     H7N9     H1N1
H9N2   8 seqs 49% hits 34% hits 48% hits 34% hits
H5N1 6 blocks   8 seqs 29% hits 45% hits 39% hits
H2N2 7 blocks 6 blocks   8 seqs 27% hits 35% hits
H7N9 6 blocks 5 blocks 6 blocks   8 seqs 32% hits
H1N1 6 blocks 6 blocks 6 blocks 6 blocks   8 seqs

DECIPHER documentation built on Nov. 8, 2020, 8:30 p.m.

DECIPHER index

Package overview Classify Sequences Design Group-Specific FISH Probes Design Group-Specific Primers Design Microarray Probes Design Primers That Yield Group-Specific Signatures Finding Chimeric Sequences Getting Started DECIPHERing The Art of Multiple Sequence Alignment in R The Magic of Gene Finding

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

DECIPHER
Tools for curating, analyzing, and manipulating biological sequences

FindSynteny: Finds Synteny in a Sequence Database
In DECIPHER: Tools for curating, analyzing, and manipulating biological sequences

Description

Usage

Arguments

Details

Value

Note

Author(s)

See Also

Examples

Example output

Related to FindSynteny in DECIPHER...

R Package Documentation

Browse R Packages

We want your feedback!

DECIPHER Tools for curating, analyzing, and manipulating biological sequences

FindSynteny: Finds Synteny in a Sequence Database In DECIPHER: Tools for curating, analyzing, and manipulating biological sequences

Description

Usage

Arguments

Details

Value

Note

Author(s)

See Also

Examples

Example output

Related to FindSynteny in DECIPHER...

R Package Documentation

Browse R Packages

We want your feedback!

DECIPHER
Tools for curating, analyzing, and manipulating biological sequences

FindSynteny: Finds Synteny in a Sequence Database
In DECIPHER: Tools for curating, analyzing, and manipulating biological sequences