PairSummaries: Summarize connected pairs in a LinkedPairs object
In npcooley/Heron: A Package of Various Genomics Tools

Description Usage Arguments Details Value Author(s) See Also Examples

Takes in a LinkedPairs object and gene calls, and returns a pairs list.

PairSummaries(SyntenyLinks,
              GeneCalls,
              DBPATH,
              PIDs = TRUE,
              IgnoreDefaultStringSet = FALSE,
              Verbose = TRUE,
              GapPenalty = TRUE,
              TerminalPenalty = TRUE,
              Model = "Global",
              Correction = "none")

`SyntenyLinks`	A `PairedLinks` object.
`GeneCalls`	A list of named DataFrames or GRanges objects. Dataframes built by ”gffToDataFrame” can be used directly, while ”GRanges” objects may also be used with limited functionality. Using a ”GRanges” object will force all alignments to nucleotide alignments.
`DBPATH`	A SQLite connection object or a character string specifying the path to the database file. Constructed from DECIPHER's `Seqs2DB` function.
`PIDs`	Logical indicating whether to perform pairwise alignments. If `TRUE` (the default) all pairs will be aligned using DECIPHER's `AlignSeqs`, or `AlignTranslation` function. This step can be time consuming, especially for large numbers of pairs.
`IgnoreDefaultStringSet`	Logical indicating alignment type preferences. If `FALSE` (the default) pairs that can be aligned in amino acid space will be aligned as an `AAStringSet`. If `TRUE` all pairs will be aligned in nucleotide space.
`Verbose`	Logical indicating whether or not to display a progress bar and print the time difference upon completion.
`GapPenalty`	Argument passed to `AlignTranslation`
`TerminalPenalty`	Argument passed to `AlignTranslation`
`Model`	A character string specifying a model to use to identify pairs that are unlikely to be good orthologs. By default this is ”Global”, but two other models are included; ”Local” and ”Exact”, which have minor differences in performance. Alternatively, a user generated model can be used.
`Correction`	Argument to be passed to `DistanceMatrix`

The LinkedPairs object generated by NucleotideOverlap is a container for raw data that describes possible orthologous relationships, however ultimate assignment of orthology is up to user discretion. PairSummaries generates a clear table with relevant statistics for a user to work with as they choose. The option to align all pairs, though onerous can allow users to apply a hard threshold to predictions by PID, while built in models can allow a more succinct and expedient thresholding.

A data.frame with rownames indicating orthologous pairs.

Nicholas Cooley npc19@pitt.edu

FindSynteny, Synteny-class

DBPATH <- system.file("extdata",
                      "VignetteSeqs.sqlite",
                      package = "SynExtend")

# Alternatively, to build a database using DECIPHER:
# DBPATH <- tempfile()
# FNAs <- c("ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/006/740/685/GCA_006740685.1_ASM674068v1/GCA_006740685.1_ASM674068v1_genomic.fna.gz",
#           "ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/956/175/GCA_000956175.1_ASM95617v1/GCA_000956175.1_ASM95617v1_genomic.fna.gz",
#           "ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/875/775/GCA_000875775.1_ASM87577v1/GCA_000875775.1_ASM87577v1_genomic.fna.gz")
# for (m1 in seq_along(FNAs)) {
#  X <- readDNAStringSet(filepath = FNAs[m1])
#  X <- X[order(width(X),
#               decreasing = TRUE)]
#  
#  Seqs2DB(seqs = X,
#          type = "XStringSet",
#          dbFile = DBPATH,
#          identifier = as.character(m1),
#          verbose = TRUE)
#}

Syn <- FindSynteny(dbFile = DBPATH)

GeneCalls <- vector(mode = "list",
                    length = ncol(Syn))

GeneCalls[[1L]] <- gffToDataFrame(GFF = system.file("extdata",
                                                    "GCA_006740685.1_ASM674068v1_genomic.gff.gz",
                                                    package = "SynExtend"),
                                  Verbose = TRUE)
GeneCalls[[2L]] <- gffToDataFrame(GFF = system.file("extdata",
                                                    "GCA_000956175.1_ASM95617v1_genomic.gff.gz",
                                                    package = "SynExtend"),
                                  Verbose = TRUE)
GeneCalls[[3L]] <- gffToDataFrame(GFF = system.file("extdata",
                                                    "GCA_000875775.1_ASM87577v1_genomic.gff.gz",
                                                    package = "SynExtend"),
                                  Verbose = TRUE)
                                  
# Alternatively:
# GeneCalls <- vector(mode = "list",
#                     length = ncol(Syn))
# GeneCalls[[1L]] <- rtracklayer::import(system.file("extdata",
#                                                    "GCA_006740685.1_ASM674068v1_genomic.gff.gz",
#                                                    package = "SynExtend"))
# GeneCalls[[2L]] <- rtracklayer::import(system.file("extdata",
#                                                    "GCA_000956175.1_ASM95617v1_genomic.gff.gz",
#                                                    package = "SynExtend"))
# GeneCalls[[3L]] <- rtracklayer::import(system.file("extdata",
#                                                    "GCA_000875775.1_ASM87577v1_genomic.gff.gz,
#                                                    package = "SynExtend"))

names(GeneCalls) <- seq(length(GeneCalls))

Links <- NucleotideOverlap(SyntenyObject = Syn,
                           GeneCalls = GeneCalls,
                           LimitIndex = FALSE,
                           Verbose = TRUE)

PredictedPairs <- PairSummaries(SyntenyLinks = Links,
                                GeneCalls = GeneCalls,
                                DBPATH = DBPATH,
                                PIDs = FALSE,
                                Verbose = TRUE,
                                Model = "Global",
                                Correction = "none")