read_fasta | R Documentation |
Given a
standard FASTA-formatted file,
read_fasta
will read in the contents of the file and create a three column
data frame with columns for the sequence id, the sequence itself, and any
comments found in the header line for each sequence.
read_fasta(file, degap = TRUE)
file |
Either a path to a file, a connection, or literal data (either a single string or a raw vector) containing DNA sequences in the standard FASTA format. There are no checks to determine whether the data are DNA or amino acid sequences. Files ending in .gz, .bz2, .xz, or .zip will be automatically uncompressed.
Files starting with |
degap |
Logical value (default = TRUE) Removes gap characters from sequences indicated by "." or "-" |
A data frame object with three columns. The id
column will contain
the non-space characters following the >
in the header line of each
sequence; the sequence
column will contain the sequence; and the
comment
column will contain any text found after the first whitespace
character on the header line.
The sequences in the FASTA file can have line breaks within them and
read_fasta()
will put those separate lines into the same sequence
temp <- tempfile()
write(">seqA\nATGCATGC\n>seqB\nTACGTACG", file = temp)
write(">seqC\nTCCGATGC", file = temp, append = TRUE)
write(">seqD B.ceresus UW85\nTCCGATGC", file = temp, append = TRUE)
write(">seq4\tE. coli K12\tBacteria;Proteobacteria;\nTCCGATGC",
file = temp,
append = TRUE
)
write(">seq_4\tSalmonella LT2\tBacteria;Proteobacteria;\nTCCGATGC",
file = temp, append = TRUE
)
write(">seqE B.ceresus UW123\nTCCGATGC\nTCCGATGC",
file = temp,
append = TRUE
)
sequence_df <- read_fasta(temp)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.