SummarizePairs: Provide summaries of hypothetical orthologs.

View source: R/SummarizePairs.R

SummarizePairsR Documentation

Provide summaries of hypothetical orthologs.

Description

Given a LinkedPairs object and a DECIPHER database, return a data.frame of summarized genomic feature pairs. SummarizePairs will collect all the linked genomic features in the supplied LinkedPairs-class object and return descriptions of the alignments of those features.

Usage

SummarizePairs(SynExtendObject,
               DataBase01,
               AlignmentFun = "AlignPairs",
               DefaultTranslationTable = "11",
               KmerSize = 5,
               Verbose = FALSE,
               ShowPlot = FALSE,
               Processors = 1,
               Storage = 2,
               IndexParams = list("K" = 5),
               SearchParams = list("perPatternLimit" = 0),
               SearchScheme = "spike",
               RejectBy = "rank",
               RetainInternal = FALSE,
               ...)

Arguments

SynExtendObject

An object of class LinkedPairs-class.

DataBase01

A character string pointing to a SQLite database, or a connection to a DECIPHER database.

AlignmentFun

Character of length 1; a character string of length one specifying a link{DECIPHER} alignment function. Currently only supports AlignPairs.

DefaultTranslationTable

Character of length 1; an identifier that can be recognized by getGeneticCode to use as the translation table for translating coding sequences in the case that one is missing from supplied genecalls.

KmerSize

Integer of length 1; Specify the kmer size for assessing kmer distance in nucleotide space between two candidate pairs.

Verbose

Logical of length 1; if TRUE progress bar and function timing will be displayed.

ShowPlot

Logical of length 1; if TRUE provide some plots describing candidate pairs. Currently not implemented.

Processors

Integer of length 1; specify the number of processors available to SummarizePairs for multithreaded applications. If NULL all available detectable cores will be requested.

Storage

Numeric of length 1; a soft limit on the memory alloted to SummarizePairs for the storage of sequence data from the supplied database. In Gb.

IndexParams

A named list of arguments to be passed to IndexSeqs. Must be compliant with do.call's expectation for its args argument.

SearchParams

A named list of arguments to be passed to SearchIndex. Must be compliant with do.call's expectation for its args argument.

SearchScheme

Character of length 1; currently supported arguments include; "spike" indicating to 'spike' in a population of background candidates by searching one set of codings sequences against the reverse of another, "standard" which will only search coding sequences from one genome against the other in the forward direction, and "reciprocal" which will perform a search strategy similar to Reciprocal Best Hits.

RejectBy

Character of length 1; currently supported arguments include; "glm" and "lm" which use the eponymous functions to model the data within a set of candidate pairs and reject candidate pairs below a particular False Discovery Rate as determined from a set of known negatives generated when a "spike" search scheme is used. "kmeans" is a supported method that will run a naive kmeans based routine to cluster candidates within the set and reject candidates below a user supplied threshold. Lastly, "direct" will simply rank all candidate pairs by the user supplied attribute and drop all candidates below a user supplied FDR threshold.

RetainInternal

Logical of length 1; if TRUE internal values used for candidate pair rejection will be attached to the returned object.

...

Additional arguments to pass to interior functions. Currently not implemented.

Details

SummarizePairs collects features describing each linked feature pair. These features include:

  • p1: a character identifier for the candidate pair partner in the supplied query.

  • p2: a character identifier for the candidate pair partner in the supplied subject.

  • Consensus: a numeric value calculated by HitConsensus describing whether relative locations of linking hits are in linearly similar positions in both candidate pair partners.

  • p1featurelength: length of candidate query feature in nucleotides.

  • p2featurelength: length of candidate subject feature in nucleotides.

  • blocksize: integer value indicating the number of shared features blocked together.

  • KDist: numeric value of the euclidean distance between candidate pairs in kmer space.

  • TotalMatch: integer value indicating total nucleotides shared between candidates pairs in the original searches.

  • MaxMatch: integer value indicating the largest kmer shared between candidate pairs in the original searches.

  • UniqueMatches: integer value indicating the number of kmers shared between candidate pair partners.

  • Local_PID: numeric value of the local PID for the alignment of the candidate pair.

  • Local_Score: numeric value of the local alignment score for the candidate pair.

  • Approx_Global_PID: approximate global PID for the alignment of the candidate pair.

  • Approx_Global_Score: approximate global score for the alignment of the candidate pair.

  • Block_UID: integer value giving an identifier number to the feature block that that candidate pair is part of.

  • Delta_Background: the approximate global score of the alignment modified by the background of the sequences.

Value

An object of class PairSummaries.

Author(s)

Nicholas Cooley npc19@pitt.edu

See Also

PrepareSeqs, NucleotideOverlap, FindSynteny, LinkedPairs-class

Examples

library(RSQLite)
DBPATH <- system.file("extdata",
                      "Endosymbionts_v05a.sqlite",
                      package = "SynExtend")
tmp01 <- tempfile()
file.copy(from = DBPATH,
          to = tmp01)
data("Endosymbionts_LinkedFeatures", package = "SynExtend")
PrepareSeqs(SynExtendObject = Endosymbionts_LinkedFeatures,
            DataBase = tmp01,
            Verbose = TRUE)
DBCONN <- dbConnect(SQLite(), tmp01)
data("Endosymbionts_LinkedFeatures", package = "SynExtend")
SummarizedPairs <- SummarizePairs(SynExtendObject = Endosymbionts_LinkedFeatures,
                                  DataBase01 = DBCONN,
                                  Verbose = TRUE)
dbDisconnect(DBCONN)

npcooley/SynExtend documentation built on June 8, 2025, 5:24 a.m.