View source: R/SummarizePairs.R
SummarizePairs | R Documentation |
Given a LinkedPairs
object and a DECIPHER
database, return a data.frame of summarized genomic feature pairs. SummarizePairs
will collect all the linked genomic features in the supplied LinkedPairs-class
object and return descriptions of the alignments of those features.
SummarizePairs(SynExtendObject,
DataBase01,
AlignmentFun = "AlignPairs",
DefaultTranslationTable = "11",
KmerSize = 5,
Verbose = FALSE,
ShowPlot = FALSE,
Processors = 1,
Storage = 2,
IndexParams = list("K" = 5),
SearchParams = list("perPatternLimit" = 0),
SearchScheme = "spike",
RejectBy = "rank",
RetainInternal = FALSE,
...)
SynExtendObject |
An object of class |
DataBase01 |
A character string pointing to a SQLite database, or a connection to a |
AlignmentFun |
Character of length 1; a character string of length one specifying a |
DefaultTranslationTable |
Character of length 1; an identifier that can be recognized by |
KmerSize |
Integer of length 1; Specify the kmer size for assessing kmer distance in nucleotide space between two candidate pairs. |
Verbose |
Logical of length 1; if |
ShowPlot |
Logical of length 1; if |
Processors |
Integer of length 1; specify the number of processors available to |
Storage |
Numeric of length 1; a soft limit on the memory alloted to |
IndexParams |
A named list of arguments to be passed to |
SearchParams |
A named list of arguments to be passed to |
SearchScheme |
Character of length 1; currently supported arguments include; "spike" indicating to 'spike' in a population of background candidates by searching one set of codings sequences against the reverse of another, "standard" which will only search coding sequences from one genome against the other in the forward direction, and "reciprocal" which will perform a search strategy similar to Reciprocal Best Hits. |
RejectBy |
Character of length 1; currently supported arguments include; "glm" and "lm" which use the eponymous functions to model the data within a set of candidate pairs and reject candidate pairs below a particular False Discovery Rate as determined from a set of known negatives generated when a "spike" search scheme is used. "kmeans" is a supported method that will run a naive kmeans based routine to cluster candidates within the set and reject candidates below a user supplied threshold. Lastly, "direct" will simply rank all candidate pairs by the user supplied attribute and drop all candidates below a user supplied FDR threshold. |
RetainInternal |
Logical of length 1; if |
... |
Additional arguments to pass to interior functions. Currently not implemented. |
SummarizePairs
collects features describing each linked feature pair. These features include:
p1: a character identifier for the candidate pair partner in the supplied query.
p2: a character identifier for the candidate pair partner in the supplied subject.
Consensus: a numeric value calculated by HitConsensus
describing whether relative locations of linking hits are in linearly similar positions in both candidate pair partners.
p1featurelength: length of candidate query feature in nucleotides.
p2featurelength: length of candidate subject feature in nucleotides.
blocksize: integer value indicating the number of shared features blocked together.
KDist: numeric value of the euclidean distance between candidate pairs in kmer space.
TotalMatch: integer value indicating total nucleotides shared between candidates pairs in the original searches.
MaxMatch: integer value indicating the largest kmer shared between candidate pairs in the original searches.
UniqueMatches: integer value indicating the number of kmers shared between candidate pair partners.
Local_PID: numeric value of the local PID for the alignment of the candidate pair.
Local_Score: numeric value of the local alignment score for the candidate pair.
Approx_Global_PID: approximate global PID for the alignment of the candidate pair.
Approx_Global_Score: approximate global score for the alignment of the candidate pair.
Block_UID: integer value giving an identifier number to the feature block that that candidate pair is part of.
Delta_Background: the approximate global score of the alignment modified by the background of the sequences.
An object of class PairSummaries
.
Nicholas Cooley npc19@pitt.edu
PrepareSeqs
, NucleotideOverlap
, FindSynteny
, LinkedPairs-class
library(RSQLite)
DBPATH <- system.file("extdata",
"Endosymbionts_v05a.sqlite",
package = "SynExtend")
tmp01 <- tempfile()
file.copy(from = DBPATH,
to = tmp01)
data("Endosymbionts_LinkedFeatures", package = "SynExtend")
PrepareSeqs(SynExtendObject = Endosymbionts_LinkedFeatures,
DataBase = tmp01,
Verbose = TRUE)
DBCONN <- dbConnect(SQLite(), tmp01)
data("Endosymbionts_LinkedFeatures", package = "SynExtend")
SummarizedPairs <- SummarizePairs(SynExtendObject = Endosymbionts_LinkedFeatures,
DataBase01 = DBCONN,
Verbose = TRUE)
dbDisconnect(DBCONN)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.