pick: Pick bins and seqs by name or position

View source: R/pick.R

pickR Documentation

Pick bins and seqs by name or position

Description

Pick which bins and seqs to show and in what order. Uses dplyr::select()-like syntax, which means unquoted genome names, positional arguments and selection helpers, such as tidyselect::starts_with() are supported. Renaming is not supported.

Usage

pick(x, ...)

pick_seqs(x, ..., .bins = everything())

pick_seqs_within(x, ..., .bins = everything())

pick_by_tree(x, tree, infer_bin_id = .data$label)

Arguments

x

gggenomes object

...

bins/seqs to pick, select-like expression.

.bins

scope for positional arguments, select-like expression, enclose multiple arguments with c()!

tree

a phylogenetic tree in ggtree::ggtree or ape::ape-package-"phylo" format.

infer_bin_id

an expression to extract bin_ids from the tree data.

Details

Use the dots to select bins or sequences (depending on function suffix), and the .bins argument to set the scope for positional arguments. For example, pick_seqs(1) will pick the first sequence from the first bin, while pick_seqs(1, .bins=3) will pick the first sequence from the third bin.

Value

gggenomes object with selected bins and seqs.

gggenomes object with selected seqs.

gggenomes object with selected seqs.

gggenomes object with seqs selected by tree order.

Functions

  • pick(): pick bins by bin_id, positional argument (start at top) or select-helper.

  • pick_seqs(): pick individual seqs seq_id, positional argument (start at top left) or select-helper.

  • pick_seqs_within(): pick individual seqs but only modify bins containing those seqs, keep rest as is.

  • pick_by_tree(): align bins with the leaves in a given phylogenetic tree.

Examples

s0 <- tibble::tibble(
  bin_id = c("A", "B", "B", "B", "C", "C", "C"),
  seq_id = c("a1", "b1", "b2", "b3", "c1", "c2", "c3"),
  length = c(1e4, 6e3, 2e3, 1e3, 3e3, 3e3, 3e3)
)

p <- gggenomes(seqs = s0) + geom_seq(aes(color = bin_id), size = 3) +
  geom_bin_label() + geom_seq_label() +
  expand_limits(color = c("A", "B", "C"))
p

# remove
p %>% pick(-B)

# select and reorder, by ID and position
p %>% pick(C, 1)

# use helper function
p %>% pick(starts_with("B"))

# pick just some seqs
p %>% pick_seqs(1, c3)

# pick with .bin scope
p %>% pick_seqs(3:1, .bins = C)

# change seqs in some bins, but keep rest as is
p %>% pick_seqs_within(3:1, .bins = B)

# same w/o scope, unaffected bins remain as is
p %>% pick_seqs_within(b3, b2, b1)

# Align sequences with and plot next to a phylogenetic tree
library(patchwork) # arrange multiple plots
library(ggtree) # plot phylogenetic trees

# load and plot a phylogenetic tree
emale_mcp_tree <- read.tree(ex("emales/emales-MCP.nwk"))
t <- ggtree(emale_mcp_tree) + geom_tiplab(align = TRUE, size = 3) +
  xlim(0, 0.05) # make room for labels

p <- gggenomes(seqs = emale_seqs, genes = emale_genes) +
  geom_seq() + geom_seq() + geom_bin_label()

# plot next to each other, but with
# different order in tree and genomes
t + p + plot_layout(widths = c(1, 5))

# reorder genomes to match tree order
# with a warning caused by mismatch in y-scale expansions
t + p %>% pick_by_tree(t) + plot_layout(widths = c(1, 5))

# extra genomes are dropped with a notification
emale_seqs_more <- emale_seqs
emale_seqs_more[7, ] <- emale_seqs_more[6, ]
emale_seqs_more$seq_id[7] <- "One more genome"
p <- gggenomes(seqs = emale_seqs_more, genes = emale_genes) +
  geom_seq() + geom_seq() + geom_bin_label()
t + p %>% pick_by_tree(t) + plot_layout(widths = c(1, 5))

try({
  # no shared ids will cause an error
  p <- gggenomes(seqs = tibble::tibble(seq_id = "foo", length = 1)) +
    geom_seq() + geom_seq() + geom_bin_label()
  t + p %>% pick_by_tree(t) + plot_layout(widths = c(1, 5))

  # extra leafs in tree will cause an error
  emale_seqs_fewer <- slice_head(emale_seqs, n = 4)
  p <- gggenomes(seqs = emale_seqs_fewer, genes = emale_genes) +
    geom_seq() + geom_seq() + geom_bin_label()
  t + p %>% pick_by_tree(t) + plot_layout(widths = c(1, 5))
})


gggenomes documentation built on Sept. 11, 2024, 8:55 p.m.