PairSummaries: Summarize connected pairs in a LinkedPairs object

Description Usage Arguments Details Value Author(s) See Also Examples

View source: R/PairSummaries.R

Description

Takes in a LinkedPairs object and gene calls, and returns a pairs list.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
PairSummaries(SyntenyLinks,
              GeneCalls,
              DBPATH,
              PIDs = TRUE,
              IgnoreDefaultStringSet = FALSE,
              Verbose = TRUE,
              GapPenalty = TRUE,
              TerminalPenalty = TRUE,
              Model = "Global",
              Correction = "none")

Arguments

SyntenyLinks

A PairedLinks object.

GeneCalls

A named list of objects of class “DFrame” built from gffToDataFrame, objects of class “GRanges” imported from rtracklayer::import, or objects of class “Genes” created from the DECIPHER function FindGenes. “DFrame”s built by “gffToDataFrame” can be used directly, while “GRanges” objects may also be used with limited functionality. Using a “GRanges” object will force all alignments to nucleotide alignments. Objects of class “Genes” generated by FindGenes function equivalently to those produced by gffToDataFrame. Using a “GRanges” object will force IgnoreDefaultStringSet to TRUE.

DBPATH

A SQLite connection object or a character string specifying the path to the database file. Constructed from DECIPHER's Seqs2DB function.

PIDs

Logical indicating whether to perform pairwise alignments. If TRUE (the default) all pairs will be aligned using DECIPHER's AlignSeqs, or AlignTranslation function. This step can be time consuming, especially for large numbers of pairs.

IgnoreDefaultStringSet

Logical indicating alignment type preferences. If FALSE (the default) pairs that can be aligned in amino acid space will be aligned as an AAStringSet. If TRUE all pairs will be aligned in nucleotide space.

Verbose

Logical indicating whether or not to display a progress bar and print the time difference upon completion.

GapPenalty

Argument passed to AlignTranslation

TerminalPenalty

Argument passed to AlignTranslation

Model

A character string specifying a model to use to identify pairs that are unlikely to be good orthologs. By default this is ”Global”, but two other models are included; ”Local” and ”Exact”, which have minor differences in performance. Alternatively, a user generated model can be used.

Correction

Argument to be passed to DistanceMatrix, currently only "none" and "Jukes-Cantor" are supported options. Will only be applied to nucleotide alignments.

Details

The LinkedPairs object generated by NucleotideOverlap is a container for raw data that describes possible orthologous relationships, however ultimate assignment of orthology is up to user discretion. PairSummaries generates a clear table with relevant statistics for a user to work with as they choose. The option to align all pairs, though onerous can allow users to apply a hard threshold to predictions by PID, while built in models can allow a more succinct and expedient thresholding.

Value

A data.frame with rownames indicating orthologous pairs.

Author(s)

Nicholas Cooley npc19@pitt.edu

See Also

FindSynteny, Synteny-class

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
DBPATH <- system.file("extdata",
                      "VignetteSeqs.sqlite",
                      package = "SynExtend")

# Alternatively, to build a database using DECIPHER:
# DBPATH <- tempfile()
# FNAs <- c("ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/006/740/685/GCA_006740685.1_ASM674068v1/GCA_006740685.1_ASM674068v1_genomic.fna.gz",
#           "ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/956/175/GCA_000956175.1_ASM95617v1/GCA_000956175.1_ASM95617v1_genomic.fna.gz",
#           "ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/875/775/GCA_000875775.1_ASM87577v1/GCA_000875775.1_ASM87577v1_genomic.fna.gz")
# for (m1 in seq_along(FNAs)) {
#  X <- readDNAStringSet(filepath = FNAs[m1])
#  X <- X[order(width(X),
#               decreasing = TRUE)]
#  
#  Seqs2DB(seqs = X,
#          type = "XStringSet",
#          dbFile = DBPATH,
#          identifier = as.character(m1),
#          verbose = TRUE)
#}

Syn <- FindSynteny(dbFile = DBPATH)

GeneCalls <- vector(mode = "list",
                    length = ncol(Syn))

GeneCalls[[1L]] <- gffToDataFrame(GFF = system.file("extdata",
                                                    "GCA_006740685.1_ASM674068v1_genomic.gff.gz",
                                                    package = "SynExtend"),
                                  Verbose = TRUE)
GeneCalls[[2L]] <- gffToDataFrame(GFF = system.file("extdata",
                                                    "GCA_000956175.1_ASM95617v1_genomic.gff.gz",
                                                    package = "SynExtend"),
                                  Verbose = TRUE)
GeneCalls[[3L]] <- gffToDataFrame(GFF = system.file("extdata",
                                                    "GCA_000875775.1_ASM87577v1_genomic.gff.gz",
                                                    package = "SynExtend"),
                                  Verbose = TRUE)
                                  
# Alternatively:
# GeneCalls <- vector(mode = "list",
#                     length = ncol(Syn))
# GeneCalls[[1L]] <- rtracklayer::import(system.file("extdata",
#                                                    "GCA_006740685.1_ASM674068v1_genomic.gff.gz",
#                                                    package = "SynExtend"))
# GeneCalls[[2L]] <- rtracklayer::import(system.file("extdata",
#                                                    "GCA_000956175.1_ASM95617v1_genomic.gff.gz",
#                                                    package = "SynExtend"))
# GeneCalls[[3L]] <- rtracklayer::import(system.file("extdata",
#                                                    "GCA_000875775.1_ASM87577v1_genomic.gff.gz,
#                                                    package = "SynExtend"))

names(GeneCalls) <- seq(length(GeneCalls))

Links <- NucleotideOverlap(SyntenyObject = Syn,
                           GeneCalls = GeneCalls,
                           LimitIndex = FALSE,
                           Verbose = TRUE)

PredictedPairs <- PairSummaries(SyntenyLinks = Links,
                                GeneCalls = GeneCalls,
                                DBPATH = DBPATH,
                                PIDs = FALSE,
                                Verbose = TRUE,
                                Model = "Global",
                                Correction = "none")

SynExtend documentation built on Nov. 8, 2020, 7:50 p.m.