PairSummaries: Summarize connected pairs in a LinkedPairs object

Description Usage Arguments Details Value Author(s) See Also Examples

View source: R/PairSummaries.R

Description

Takes in a LinkedPairs object and gene calls, and returns a pairs list.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
PairSummaries(SyntenyLinks,
              GeneCalls,
              DBPATH,
              PIDs = TRUE,
              IgnoreDefaultStringSet = FALSE,
              Verbose = TRUE,
              GapPenalty = TRUE,
              TerminalPenalty = TRUE,
              Model = "Global",
              Correction = "none")

Arguments

SyntenyLinks

A PairedLinks object.

GeneCalls

A list of named DataFrames or GRanges objects. Dataframes built by ”gffToDataFrame” can be used directly, while ”GRanges” objects may also be used with limited functionality. Using a ”GRanges” object will force all alignments to nucleotide alignments.

DBPATH

A SQLite connection object or a character string specifying the path to the database file. Constructed from DECIPHER's Seqs2DB function.

PIDs

Logical indicating whether to perform pairwise alignments. If TRUE (the default) all pairs will be aligned using DECIPHER's AlignSeqs, or AlignTranslation function. This step can be time consuming, especially for large numbers of pairs.

IgnoreDefaultStringSet

Logical indicating alignment type preferences. If FALSE (the default) pairs that can be aligned in amino acid space will be aligned as an AAStringSet. If TRUE all pairs will be aligned in nucleotide space.

Verbose

Logical indicating whether or not to display a progress bar and print the time difference upon completion.

GapPenalty

Argument passed to AlignTranslation

TerminalPenalty

Argument passed to AlignTranslation

Model

A character string specifying a model to use to identify pairs that are unlikely to be good orthologs. By default this is ”Global”, but two other models are included; ”Local” and ”Exact”, which have minor differences in performance. Alternatively, a user generated model can be used.

Correction

Argument to be passed to DistanceMatrix

Details

The LinkedPairs object generated by NucleotideOverlap is a container for raw data that describes possible orthologous relationships, however ultimate assignment of orthology is up to user discretion. PairSummaries generates a clear table with relevant statistics for a user to work with as they choose. The option to align all pairs, though onerous can allow users to apply a hard threshold to predictions by PID, while built in models can allow a more succinct and expedient thresholding.

Value

A data.frame with rownames indicating orthologous pairs.

Author(s)

Nicholas Cooley npc19@pitt.edu

See Also

FindSynteny, Synteny-class

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
DBPATH <- system.file("extdata",
                      "VignetteSeqs.sqlite",
                      package = "SynExtend")

# Alternatively, to build a database using DECIPHER:
# DBPATH <- tempfile()
# FNAs <- c("ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/006/740/685/GCA_006740685.1_ASM674068v1/GCA_006740685.1_ASM674068v1_genomic.fna.gz",
#           "ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/956/175/GCA_000956175.1_ASM95617v1/GCA_000956175.1_ASM95617v1_genomic.fna.gz",
#           "ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/875/775/GCA_000875775.1_ASM87577v1/GCA_000875775.1_ASM87577v1_genomic.fna.gz")
# for (m1 in seq_along(FNAs)) {
#  X <- readDNAStringSet(filepath = FNAs[m1])
#  X <- X[order(width(X),
#               decreasing = TRUE)]
#  
#  Seqs2DB(seqs = X,
#          type = "XStringSet",
#          dbFile = DBPATH,
#          identifier = as.character(m1),
#          verbose = TRUE)
#}

Syn <- FindSynteny(dbFile = DBPATH)

GeneCalls <- vector(mode = "list",
                    length = ncol(Syn))

GeneCalls[[1L]] <- gffToDataFrame(GFF = system.file("extdata",
                                                    "GCA_006740685.1_ASM674068v1_genomic.gff.gz",
                                                    package = "SynExtend"),
                                  Verbose = TRUE)
GeneCalls[[2L]] <- gffToDataFrame(GFF = system.file("extdata",
                                                    "GCA_000956175.1_ASM95617v1_genomic.gff.gz",
                                                    package = "SynExtend"),
                                  Verbose = TRUE)
GeneCalls[[3L]] <- gffToDataFrame(GFF = system.file("extdata",
                                                    "GCA_000875775.1_ASM87577v1_genomic.gff.gz",
                                                    package = "SynExtend"),
                                  Verbose = TRUE)
                                  
# Alternatively:
# GeneCalls <- vector(mode = "list",
#                     length = ncol(Syn))
# GeneCalls[[1L]] <- rtracklayer::import(system.file("extdata",
#                                                    "GCA_006740685.1_ASM674068v1_genomic.gff.gz",
#                                                    package = "SynExtend"))
# GeneCalls[[2L]] <- rtracklayer::import(system.file("extdata",
#                                                    "GCA_000956175.1_ASM95617v1_genomic.gff.gz",
#                                                    package = "SynExtend"))
# GeneCalls[[3L]] <- rtracklayer::import(system.file("extdata",
#                                                    "GCA_000875775.1_ASM87577v1_genomic.gff.gz,
#                                                    package = "SynExtend"))

names(GeneCalls) <- seq(length(GeneCalls))

Links <- NucleotideOverlap(SyntenyObject = Syn,
                           GeneCalls = GeneCalls,
                           LimitIndex = FALSE,
                           Verbose = TRUE)

PredictedPairs <- PairSummaries(SyntenyLinks = Links,
                                GeneCalls = GeneCalls,
                                DBPATH = DBPATH,
                                PIDs = FALSE,
                                Verbose = TRUE,
                                Model = "Global",
                                Correction = "none")

npcooley/Heron documentation built on April 4, 2020, 10:24 p.m.