read.fasta: Read in sequences in FASTA format
In peplib: Peptide Library Analysis Methods

Description Usage Arguments Details Value Author(s) See Also

Read in sequences in FASTA format

1 2	read.fasta(file, header = FALSE, sep = "", quote = "\"", dec = ".", fill = FALSE, alphabet = aabet)

`file`	the name of the file which the data are to be read from. Each row of the table appears as one line of the file. If it does not contain an _absolute_ path, the file name is _relative_ to the current working directory, `getwd()`. Tilde-expansion is performed where supported. As from R 2.10.0 this can be a compressed file (see `file`). Alternatively, `file` can be a readable text-mode `connection` (which will be opened for reading if necessary, and if so `close`d (and hence destroyed) at the end of the function call). (If `stdin()` is used, the prompts for lines may be somewhat confusing. Terminate input with a blank line or an EOF signal, `Ctrl-D` on Unix and `Ctrl-Z` on Windows. Any pushback on `stdin()` will be cleared before return.) `file` can also be a complete URL.
`header`	a logical value indicating whether the file contains the names of the variables as its first line. If missing, the value is determined from the file format: `header` is set to `TRUE` if and only if the first row contains one fewer field than the number of columns.
`sep`	the field separator character. Values on each line of the file are separated by this character. If `sep = ""` (the default for `read.table`) the separator is `white space`, that is one or more spaces, tabs, newlines or carriage returns.
`quote`	the set of quoting characters. To disable quoting altogether, use `quote = ""`. See `scan` for the behavior on quotes embedded in quotes. Quoting is only considered for columns read as character, which is all of them unless `colClasses` is specified.
`dec`	the character used in the file for decimal points.
`fill`	logical. If `TRUE` then in case the rows have unequal length, blank fields are implicitly added.
`alphabet`	The alphabet to use for the sequences. The default alphabet contains the canonical 20 amino acids, as well as B, Z, X, and `-`, where X is an unspecified residue and `-` is a gap.

See the details for read.table for more information about reading the file itself. Information about the FASTA form may be found elsewhere, but basically each sequence starts with a definition/name deliminated by a '<' character. For example:
———————-
>Sequence 1, from mouse
FTRP
>Sequence 2b, from humans
FPYT
>Unkown origin
FPRW
———————–
Each sequence should be the same length, thus - should be use to pad the sequences, as seen in the example. Use an alignment algorithm, such as Clustal, to align your sequences before reading. The ClustalW2 algorithm is available from the European Bioinformatics Institutes's website.

An object of class Sequences. This is a small extension of the matrix class, and as expected, each row of the matrix corresponds to a single sequence. The sequences are always represented as integers. The rownames of the matrix are the original string/character representations of the sequences.

Andrew White

Sequences, read.sequences

peplib documentation built on May 29, 2017, 10:52 p.m.