Description Usage Arguments Value Note Author(s) See Also Examples
Reads an MSA from a file.
1 2 3 4 5 | read.msa(filename, format = c(guess.format.msa(filename), "FASTA")[1],
alphabet = NULL, features = NULL, do.4d = FALSE, ordered = (do.4d ==
FALSE && is.null(features)), tuple.size = (if (do.4d) 3 else NULL),
do.cats = NULL, refseq = NULL, offset = 0, seqnames = NULL,
discard.seqnames = NULL, pointer.only = FALSE)
|
filename |
The name of the input file containing an alignment. |
format |
input file format: one of "FASTA", "MAF", "SS", "PHYLIP", "MPM", must be correctly specified. |
alphabet |
the alphabet of non-missing-data chraracters in the alignment. Determined automatically from the alignment if not given. |
features |
An object of type |
do.4d |
Logical. If |
ordered |
Logical. If |
tuple.size |
Integer. If given, and if pointer.only is |
do.cats |
Character vector if features is provided; integer vector if cats.cylce is provided. If given, only the types of features named here will be represented in the (unordered) return alignment. |
refseq |
Character string specifying a FASTA format file with a reference sequence. If given, the reference sequence will be "filled in" whereever missing from the alignment. |
offset |
An integer giving offset of reference sequence from beginning of chromosome. Not used for MAF or SS format. |
seqnames |
A character vector. If provided, discard any sequence in the msa that is not named here. This is only implemented efficiently for MAF input files, but in this case, the reference sequence must be named. |
discard.seqnames |
A character vector. If provided, discard sequenced named here. This is only implemented efficiently for MAF input files, but in this case, the reference sequenced must NOT be discarded. |
pointer.only |
If |
an MSA object.
If the input is in "MAF" format and features is specified, the resulting alignment will be stripped of gaps in the reference (1st) sequence.
Melissa J. Hubisz and Adam Siepel
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 | exampleArchive <- system.file("extdata", "examples.zip", package="rphast")
files <- c("ENr334-100k.maf", "ENr334-100k.fa", "gencode.ENr334-100k.gff")
unzip(exampleArchive, files)
# Read a fasta file, ENr334-100k.fa
# this file represents a 4-way alignment of the encode region
# ENr334 starting from hg18 chr6 position 41405894
idx.offset <- 41405894
m1 <- read.msa("ENr334-100k.fa", offset=idx.offset)
m1
# Now read in only a subset represented in a feature file
f <- read.feat("gencode.ENr334-100k.gff")
f$seqname <- "hg18" # need to tweak source name to match name in alignment
m1 <- read.msa("ENr334-100k.fa", features=f, offset=idx.offset)
# Can also subset on certain features
do.cats <- c("CDS", "5'flank", "3'flank")
m1 <- read.msa("ENr334-100k.fa", features=f, offset=idx.offset,
do.cats=do.cats)
# Can read MAFs similarly, but don't need offset because
# MAF file is annotated with coordinates
m2 <- read.msa("ENr334-100k.maf", features=f, do.cats=do.cats)
# Also, note that when features is given and the file is
# in MAF format, the first sequence is automatically
# stripped of gaps
ncol.msa(m1)
ncol.msa(m2)
ncol.msa(m1, "hg18")
unlink(files) # clean up
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.