fill_introns: Fill-in introns in an alignment

Description Usage Arguments Value Examples

Description

The alignment must be produced by aligning sequence to a reference with introns masked by 'n's, which results in gaps in the rest of the sequences. This function replaces those gaps with 'n's and removes the reference sequence.

Usage

1
fill_introns(alignment, ref_pattern, outgroup = NULL, trim_outgroup = FALSE)

Arguments

alignment

matrix of class DNAbin

ref_pattern

Pattern used for matching with grep to identify reference sequences.

outgroup

Character vector; names of outgroup sequences.

trim_outgroup

Logical; should the outgroups be trimmed from the alignment?

Value

Matrix of class DNAbin

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
library(ape)
data(woodmouse)

# Make reference sequence with a 50bp intron (a string of 'n's)
# in the middle.
woodmouse_ref <- as.character(woodmouse[1,])
woodmouse_ref <- c(woodmouse_ref[1:400],
  rep("n", 50),
  woodmouse_ref[401:length(woodmouse_ref)])
woodmouse_ref <- as.DNAbin(woodmouse_ref)

# Align with other sequences that don't include the intron.
# (need to convert to list first)
woodmouse_ref <- as.list(woodmouse_ref)
names(woodmouse_ref) <- "ref"
woodmouse <- as.list(woodmouse)
woodmouse_with_introns <- ips::mafft(
  c(woodmouse, woodmouse_ref),
  exec = "/usr/bin/mafft")

# Image of the alignment shows that 'ref' has 'n's at positions 400-450,
# while other sequences have gaps ('-').
image(woodmouse_with_introns)

# Fill-in introns
woodmouse_masked <- fill_introns(
  woodmouse_with_introns,
  ref_pattern = "ref"
)

# After filling-in, the reference sequence is gone and
# all introns are 'n's.
image(woodmouse_masked)

joelnitta/baitfindR documentation built on May 7, 2020, 6:21 p.m.