SummarizePairs: Provide summaries of hypothetical orthologs.
In npcooley/SynExtend: Tools for Comparative Genomics

SummarizePairs

R Documentation

Provide summaries of hypothetical orthologs.

Description

Given a LinkedPairs object and a DECIPHER database, return a data.frame of summarized genomic feature pairs. SummarizePairs will collect all the linked genomic features in the supplied LinkedPairs-class object and return descriptions of the alignments of those features.

Usage

SummarizePairs(SynExtendObject,
               DataBase01,
               AlignmentFun = "AlignPairs",
               DefaultTranslationTable = "11",
               KmerSize = 5,
               Verbose = FALSE,
               ShowPlot = FALSE,
               Processors = 1,
               Storage = 2,
               IndexParams = list("K" = 5),
               SearchParams = list("perPatternLimit" = 0),
               SearchScheme = "spike",
               RejectBy = "rank",
               RetainInternal = FALSE,
               ...)

Arguments

`SynExtendObject`	An object of class `LinkedPairs-class`.
`DataBase01`	A character string pointing to a SQLite database, or a connection to a `DECIPHER` database.
`AlignmentFun`	Character of length 1; a character string of length one specifying a `link{DECIPHER}` alignment function. Currently only supports `AlignPairs`.
`DefaultTranslationTable`	Character of length 1; an identifier that can be recognized by `getGeneticCode` to use as the translation table for translating coding sequences in the case that one is missing from supplied genecalls.
`KmerSize`	Integer of length 1; Specify the kmer size for assessing kmer distance in nucleotide space between two candidate pairs.
`Verbose`	Logical of length 1; if `TRUE` progress bar and function timing will be displayed.
`ShowPlot`	Logical of length 1; if `TRUE` provide some plots describing candidate pairs. Currently not implemented.
`Processors`	Integer of length 1; specify the number of processors available to `SummarizePairs` for multithreaded applications. If `NULL` all available detectable cores will be requested.
`Storage`	Numeric of length 1; a soft limit on the memory alloted to `SummarizePairs` for the storage of sequence data from the supplied database. In Gb.
`IndexParams`	A named list of arguments to be passed to `IndexSeqs`. Must be compliant with `do.call`'s expectation for its `args` argument.
`SearchParams`	A named list of arguments to be passed to `SearchIndex`. Must be compliant with `do.call`'s expectation for its `args` argument.
`SearchScheme`	Character of length 1; currently supported arguments include; "spike" indicating to 'spike' in a population of background candidates by searching one set of codings sequences against the reverse of another, "standard" which will only search coding sequences from one genome against the other in the forward direction, and "reciprocal" which will perform a search strategy similar to Reciprocal Best Hits.
`RejectBy`	Character of length 1; currently supported arguments include; "glm" and "lm" which use the eponymous functions to model the data within a set of candidate pairs and reject candidate pairs below a particular False Discovery Rate as determined from a set of known negatives generated when a "spike" search scheme is used. "kmeans" is a supported method that will run a naive kmeans based routine to cluster candidates within the set and reject candidates below a user supplied threshold. Lastly, "direct" will simply rank all candidate pairs by the user supplied attribute and drop all candidates below a user supplied FDR threshold.
`RetainInternal`	Logical of length 1; if `TRUE` internal values used for candidate pair rejection will be attached to the returned object.
`...`	Additional arguments to pass to interior functions. Currently not implemented.

Details

SummarizePairs collects features describing each linked feature pair. These features include:

p1: a character identifier for the candidate pair partner in the supplied query.
p2: a character identifier for the candidate pair partner in the supplied subject.
Consensus: a numeric value calculated by HitConsensus describing whether relative locations of linking hits are in linearly similar positions in both candidate pair partners.
p1featurelength: length of candidate query feature in nucleotides.
p2featurelength: length of candidate subject feature in nucleotides.
blocksize: integer value indicating the number of shared features blocked together.
KDist: numeric value of the euclidean distance between candidate pairs in kmer space.
TotalMatch: integer value indicating total nucleotides shared between candidates pairs in the original searches.
MaxMatch: integer value indicating the largest kmer shared between candidate pairs in the original searches.
UniqueMatches: integer value indicating the number of kmers shared between candidate pair partners.
Local_PID: numeric value of the local PID for the alignment of the candidate pair.
Local_Score: numeric value of the local alignment score for the candidate pair.
Approx_Global_PID: approximate global PID for the alignment of the candidate pair.
Approx_Global_Score: approximate global score for the alignment of the candidate pair.
Block_UID: integer value giving an identifier number to the feature block that that candidate pair is part of.
Delta_Background: the approximate global score of the alignment modified by the background of the sequences.

Value

An object of class PairSummaries.

Author(s)

Nicholas Cooley npc19@pitt.edu

Examples

library(RSQLite)
DBPATH <- system.file("extdata",
                      "Endosymbionts_v05a.sqlite",
                      package = "SynExtend")
tmp01 <- tempfile()
file.copy(from = DBPATH,
          to = tmp01)
data("Endosymbionts_LinkedFeatures", package = "SynExtend")
PrepareSeqs(SynExtendObject = Endosymbionts_LinkedFeatures,
            DataBase = tmp01,
            Verbose = TRUE)
DBCONN <- dbConnect(SQLite(), tmp01)
data("Endosymbionts_LinkedFeatures", package = "SynExtend")
SummarizedPairs <- SummarizePairs(SynExtendObject = Endosymbionts_LinkedFeatures,
                                  DataBase01 = DBCONN,
                                  Verbose = TRUE)
dbDisconnect(DBCONN)

npcooley/SynExtend documentation built on June 8, 2025, 5:24 a.m.