PairwiseAlignments-io: Write a PairwiseAlignments object to a file

Description Usage Arguments Details Note Author(s) References See Also Examples

Description

The writePairwiseAlignments function writes a PairwiseAlignments object to a file. Only the "pair" format is supported at the moment.

Usage

1
writePairwiseAlignments(x, file="", Matrix=NA, block.width=50)

Arguments

x

A PairwiseAlignments object, typically returned by the pairwiseAlignment function.

file

A connection, or a character string naming the file to print to. If "" (the default), writePairwiseAlignments prints to the standard output connection (aka the console) unless redirected by sink. If it is "|cmd", the output is piped to the command given by cmd, by opening a pipe connection.

Matrix

A single string containing the name of the substitution matrix (e.g. "BLOSUM50") used for the alignment. See the substitutionMatrix argument of the pairwiseAlignment function for the details. See ?substitution.matrices for a list of predefined substitution matrices available in the Biostrings package.

block.width

A single integer specifying the maximum number of sequence letters (including the "-" letter, which represents gaps) per line.

Details

The "pair" format is one of the numerous pairwise sequence alignment formats supported by the EMBOSS software. See http://emboss.sourceforge.net/docs/themes/AlignFormats.html for a brief (and rather informal) description of this format.

Note

This brief description of the "pair" format suggests that it is best suited for global pairwise alignments, because, in that case, the original pattern and subject sequences can be inferred (by just removing the gaps).

However, even though the "pair" format can also be used for non global pairwise alignments (i.e. for global-local, local-global, and local pairwise alignments), in that case the original pattern and subject sequences cannot be inferred. This is because the alignment written to the file doesn't necessarily span the entire pattern (if type(x) is local-global or local) or the entire subject (if type(x) is global-local or local).

As a consequence, the writePairwiseAlignments function can be used on a PairwiseAlignments object x containing non global alignments (i.e. with type(x) != "global"), but with the 2 following caveats:

  1. The type of the alignments (type(x)) is not written to the file.

  2. The original pattern and subject sequences cannot be inferred. Furthermore, there is no way to infer their lengths (because we don't know whether they were trimmed or not).

Also note that the pairwiseAlignment function interprets the gapOpening and gapExtension arguments differently than most other alignment tools. As a consequence the values of the Gap_penalty and Extend_penalty fields written to the file are not the same as the values that were passed to the gapOpening and gapExtension arguments. With the following relationship:

Author(s)

H. Pag<c3><a8>s

References

http://emboss.sourceforge.net/docs/themes/AlignFormats.html

See Also

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
## ---------------------------------------------------------------------
## A. WITH ONE PAIR
## ---------------------------------------------------------------------
pattern <- DNAString("CGTACGTAACGTTCGT")
subject <- DNAString("CGTCGTCGTCCGTAA")
pa1 <- pairwiseAlignment(pattern, subject)
pa1
writePairwiseAlignments(pa1)
writePairwiseAlignments(pa1, block.width=10)
## The 2 bottom-right numbers (16 and 15) are the lengths of
## the original pattern and subject, respectively.

pa2 <- pairwiseAlignment(pattern, subject, type="global-local")
pa2  # score is different!
writePairwiseAlignments(pa2)
## By just looking at the file, we can't tell the length of the
## original subject! Could be 13, could be more...

pattern <- DNAString("TCAACTTAACTT")
subject <- DNAString("GGGCAACAACGGG")
pa3 <- pairwiseAlignment(pattern, subject, type="global-local",
                         gapOpening=-2, gapExtension=-1)
writePairwiseAlignments(pa3)

## ---------------------------------------------------------------------
## B. WITH MORE THAN ONE PAIR (AND NAMED PATTERNS)
## ---------------------------------------------------------------------
pattern <- DNAStringSet(c(myp1="ACCA", myp2="ACGCA", myp3="ACGGCA"))
pa4 <- pairwiseAlignment(pattern, subject)
pa4
writePairwiseAlignments(pa4)

## ---------------------------------------------------------------------
## C. REPRODUCING THE ALIGNMENT SHOWN AT
##    http://emboss.sourceforge.net/docs/themes/alnformats/align.pair
## ---------------------------------------------------------------------
pattern <- c("TSPASIRPPAGPSSRPAMVSSRRTRPSPPGPRRPTGRPCCSAAPRRPQAT",
             "GGWKTCSGTCTTSTSTRHRGRSGWSARTTTAACLRASRKSMRAACSRSAG",
             "SRPNRFAPTLMSSCITSTTGPPAWAGDRSHE")
subject <- c("TSPASIRPPAGPSSRRPSPPGPRRPTGRPCCSAAPRRPQATGGWKTCSGT",
             "CTTSTSTRHRGRSGWRASRKSMRAACSRSAGSRPNRFAPTLMSSCITSTT",
             "GPPAWAGDRSHE")
pattern <- unlist(AAStringSet(pattern))
subject <- unlist(AAStringSet(subject))
pattern  # original pattern
subject  # original subject
data(BLOSUM62)
pa5 <- pairwiseAlignment(pattern, subject,
                         substitutionMatrix=BLOSUM62,
                         gapOpening=9.5, gapExtension=0.5)
pa5
writePairwiseAlignments(pa5, Matrix="BLOSUM62")

Example output

Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, sd, var, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colMeans, colSums, colnames, do.call,
    duplicated, eval, evalq, get, grep, grepl, intersect, is.unsorted,
    lapply, lengths, mapply, match, mget, order, paste, pmax, pmax.int,
    pmin, pmin.int, rank, rbind, rowMeans, rowSums, rownames, sapply,
    setdiff, sort, table, tapply, union, unique, unsplit, which,
    which.max, which.min

Loading required package: S4Vectors
Loading required package: stats4

Attaching package: 'S4Vectors'

The following object is masked from 'package:base':

    expand.grid

Loading required package: IRanges
Loading required package: XVector

Attaching package: 'Biostrings'

The following object is masked from 'package:base':

    strsplit

Global PairwiseAlignmentsSingleSubject (1 of 1)
pattern: [1] CGTACGTAACGTTCGT 
subject: [1] CGT-CGT--CGTCCGT 
score: -32.11822 
########################################
# Program: Biostrings (version 2.44.2), a Bioconductor package
# Rundate: Fri Jan 12 18:58:29 2018
########################################
#=======================================
#
# Aligned_sequences: 2
# 1: P1
# 2: S1
# Matrix: NA
# Gap_penalty: 14.0
# Extend_penalty: 4.0
#
# Length: 18
# Identity:      12/18 (66.7%)
# Similarity:    NA/18 (NA%)
# Gaps:           5/18 (27.8%)
# Score: -32.11822
#
#
#=======================================

P1                 1 CGTACGTAACGTTCGT--     16
                     ||| |||  ||| |||  
S1                 1 CGT-CGT--CGTCCGTAA     15


#---------------------------------------
#---------------------------------------
########################################
# Program: Biostrings (version 2.44.2), a Bioconductor package
# Rundate: Fri Jan 12 18:58:29 2018
########################################
#=======================================
#
# Aligned_sequences: 2
# 1: P1
# 2: S1
# Matrix: NA
# Gap_penalty: 14.0
# Extend_penalty: 4.0
#
# Length: 18
# Identity:      12/18 (66.7%)
# Similarity:    NA/18 (NA%)
# Gaps:           5/18 (27.8%)
# Score: -32.11822
#
#
#=======================================

P1                 1 CGTACGTAAC     10
                     ||| |||  |
S1                 1 CGT-CGT--C      7

P1                11 GTTCGT--     16
                     || |||  
S1                 8 GTCCGTAA     15


#---------------------------------------
#---------------------------------------
Global-Local PairwiseAlignmentsSingleSubject (1 of 1)
pattern: [1] CGTACGTAACGTTCGT 
subject: [1] CGT-CGT--CGTCCGT 
score: -14.11821 
########################################
# Program: Biostrings (version 2.44.2), a Bioconductor package
# Rundate: Fri Jan 12 18:58:30 2018
########################################
#=======================================
#
# Aligned_sequences: 2
# 1: P1
# 2: S1
# Matrix: NA
# Gap_penalty: 14.0
# Extend_penalty: 4.0
#
# Length: 16
# Identity:      12/16 (75.0%)
# Similarity:    NA/16 (NA%)
# Gaps:           3/16 (18.8%)
# Score: -14.11821
#
#
#=======================================

P1                 1 CGTACGTAACGTTCGT     16
                     ||| |||  ||| |||
S1                 1 CGT-CGT--CGTCCGT     13


#---------------------------------------
#---------------------------------------
########################################
# Program: Biostrings (version 2.44.2), a Bioconductor package
# Rundate: Fri Jan 12 18:58:30 2018
########################################
#=======================================
#
# Aligned_sequences: 2
# 1: P1
# 2: S1
# Matrix: NA
# Gap_penalty: 3.0
# Extend_penalty: 1.0
#
# Length: 12
# Identity:       7/12 (58.3%)
# Similarity:    NA/12 (NA%)
# Gaps:           5/12 (41.7%)
# Score: 2.872293
#
#
#=======================================

P1                 1 TCAACTTAACTT     12
                      ||||  |||  
S1                 4 -CAAC--AAC--     10


#---------------------------------------
#---------------------------------------
Global PairwiseAlignmentsSingleSubject (1 of 3)
pattern: [1] ACCA 
subject: [5] AACA 
score: -55.95402 
########################################
# Program: Biostrings (version 2.44.2), a Bioconductor package
# Rundate: Fri Jan 12 18:58:31 2018
########################################
#=======================================
#
# Aligned_sequences: 2
# 1: myp1
# 2: S1
# Matrix: NA
# Gap_penalty: 14.0
# Extend_penalty: 4.0
#
# Length: 13
# Identity:       3/13 (23.1%)
# Similarity:    NA/13 (NA%)
# Gaps:           9/13 (69.2%)
# Score: -55.95402
#
#
#=======================================

myp1               1 ----ACCA-----      4
                         | ||     
S1                 1 GGGCAACAACGGG     13


#=======================================
#
# Aligned_sequences: 2
# 1: myp2
# 2: S1
# Matrix: NA
# Gap_penalty: 14.0
# Extend_penalty: 4.0
#
# Length: 13
# Identity:       3/13 (23.1%)
# Similarity:    NA/13 (NA%)
# Gaps:           8/13 (61.5%)
# Score: -47.8533
#
#
#=======================================

myp2               1 --------ACGCA      5
                             |||  
S1                 1 GGGCAACAACGGG     13


#=======================================
#
# Aligned_sequences: 2
# 1: myp3
# 2: S1
# Matrix: NA
# Gap_penalty: 14.0
# Extend_penalty: 4.0
#
# Length: 14
# Identity:       4/14 (28.6%)
# Similarity:    NA/14 (NA%)
# Gaps:           9/14 (64.3%)
# Score: -53.97226
#
#
#=======================================

myp3               1 --------ACGGCA      6
                             ||||  
S1                 1 GGGCAACAACGGG-     13


#---------------------------------------
#---------------------------------------
  131-letter "AAString" instance
seq: TSPASIRPPAGPSSRPAMVSSRRTRPSPPGPRRPTG...SRSAGSRPNRFAPTLMSSCITSTTGPPAWAGDRSHE
  112-letter "AAString" instance
seq: TSPASIRPPAGPSSRRPSPPGPRRPTGRPCCSAAPR...SRSAGSRPNRFAPTLMSSCITSTTGPPAWAGDRSHE
Global PairwiseAlignmentsSingleSubject (1 of 1)
pattern: [1] TSPASIRPPAGPSSRPAMVSSRRTRPSPPGPRRPT...SAGSRPNRFAPTLMSSCITSTTGPPAWAGDRSHE 
subject: [1] TSPASIRPPAGPSSR---------RPSPPGPRRPT...SAGSRPNRFAPTLMSSCITSTTGPPAWAGDRSHE 
score: 591.5 
########################################
# Program: Biostrings (version 2.44.2), a Bioconductor package
# Rundate: Fri Jan 12 18:58:31 2018
########################################
#=======================================
#
# Aligned_sequences: 2
# 1: P1
# 2: S1
# Matrix: BLOSUM62
# Gap_penalty: 10.0
# Extend_penalty: 0.5
#
# Length: 131
# Identity:     112/131 (85.5%)
# Similarity:    NA/131 (NA%)
# Gaps:          19/131 (14.5%)
# Score: 591.5
#
#
#=======================================

P1                 1 TSPASIRPPAGPSSRPAMVSSRRTRPSPPGPRRPTGRPCCSAAPRRPQAT     50
                     |||||||||||||||         ||||||||||||||||||||||||||
S1                 1 TSPASIRPPAGPSSR---------RPSPPGPRRPTGRPCCSAAPRRPQAT     41

P1                51 GGWKTCSGTCTTSTSTRHRGRSGWSARTTTAACLRASRKSMRAACSRSAG    100
                     ||||||||||||||||||||||||          ||||||||||||||||
S1                42 GGWKTCSGTCTTSTSTRHRGRSGW----------RASRKSMRAACSRSAG     81

P1               101 SRPNRFAPTLMSSCITSTTGPPAWAGDRSHE    131
                     |||||||||||||||||||||||||||||||
S1                82 SRPNRFAPTLMSSCITSTTGPPAWAGDRSHE    112


#---------------------------------------
#---------------------------------------

Biostrings documentation built on Nov. 8, 2020, 11:12 p.m.