Other vignettes:

Introduction

Synder traces orthology independent of sequence similarity. This allows unique mappings of sequences even when they are members of large families.

This work is funded by the National Science Foundation grant NSF-IOS 1546858.

The ty1 dataset includes synteny maps built between S. cerivisiae and 8 other yeast species. These maps were built using MUMmer4, with the command:

nucmer                                       \
    --maxmatch                               \
    -p sc_Saccharomyces_bayanus_masked       \
    -o ../data/masked/scerevisiae.masked.fna \
    ../data/masked/Saccharomyces_bayanus.masked.fna 
require(synder)
require(knitr)
require(readr)
require(ape)
require(data.tree)
data(ty1)
ty1$synmaps$Saccharomyces_pastorianus = NULL
ty1$synmaps$Saccharomyces_bayanus = NULL
specs <- names(ty1$synmaps)
remap <- function(spec){
   as_synmap(
       ty1$synmaps[[spec]],
       seqinfo_a="Saccharomyces_cerevisiae.tab",
       seqinfo_b=paste0(spec, ".tab")
   ) 
}
ty1$synmaps <- lapply(specs, remap)
tree <- as.Node(read.tree("yeast.tree"))
print(tree)

This yeast phylogeny was taken from [@boynton2014ecology].

knitr::kable(data_frame(
    Species=names(ty1$glens),
    NumberOfScaffolds=sapply(ty1$glens, nrow)
))

Saccharomyces cerevisiae is well-assembled, but many of the others are not.

We will see how this impacts our results.

Map density

syn <- list()
syn$c500 <- lapply(ty1$synmaps, subset, (qstop - qstart + 1) > 500)
syn$c400 <- lapply(ty1$synmaps, subset, (qstop - qstart + 1) > 400)
syn$c300 <- lapply(ty1$synmaps, subset, (qstop - qstart + 1) > 300)
syn$c200 <- lapply(ty1$synmaps, subset, (qstop - qstart + 1) > 200)
syn$c100 <- lapply(ty1$synmaps, subset, (qstop - qstart + 1) > 100)
syn$c0   <- lapply(ty1$synmaps, subset, (qstop - qstart + 1) > 0)
results <- list()
results$c500 <- lapply(syn$c500, function(x) search(x, ty1$gff))
results$c400 <- lapply(syn$c400, function(x) search(x, ty1$gff))
results$c300 <- lapply(syn$c300, function(x) search(x, ty1$gff))
results$c200 <- lapply(syn$c200, function(x) search(x, ty1$gff))
results$c100 <- lapply(syn$c100, function(x) search(x, ty1$gff))
results$c0   <- lapply(syn$c0  , function(x) search(x, ty1$gff))

Now we want to infer orthology independent of sequence similarity. So I will subtract the query intervals from the synteny maps.



arendsee/synder documentation built on May 10, 2019, 1:26 p.m.