FaFile-class: Manipulate indexed fasta files.

FaFileR Documentation

Manipulate indexed fasta files.

Description

Use FaFile() to create a reference to an indexed fasta file. The reference remains open across calls to methods, avoiding costly index re-loading.

FaFileList() provides a convenient way of managing a list of FaFile instances.

Usage


## Constructors

FaFile(file, index=sprintf("%s.fai", file),
             gzindex=sprintf("%s.gzi", file))
FaFileList(...)

## Opening / closing

## S3 method for class 'FaFile'
open(con, ...)
## S3 method for class 'FaFile'
close(con, ...)

## accessors; also path(), index()

## S4 method for signature 'FaFile'
gzindex(object, asNA=TRUE)
## S4 method for signature 'FaFileList'
gzindex(object, asNA=TRUE)
## S4 method for signature 'FaFile'
isOpen(con, rw="")

## actions

## S4 method for signature 'FaFile'
indexFa(file, ...)

## S4 method for signature 'FaFile'
scanFaIndex(file, ...)
## S4 method for signature 'FaFileList'
scanFaIndex(file, ..., as=c("GRangesList", "GRanges"))

## S4 method for signature 'FaFile'
seqinfo(x)

## S4 method for signature 'FaFile'
countFa(file, ...)

## S4 method for signature 'FaFile,GRanges'
scanFa(file, param, ...,
    as=c("DNAStringSet", "RNAStringSet", "AAStringSet"))
## S4 method for signature 'FaFile,IntegerRangesList'
scanFa(file, param, ...,
    as=c("DNAStringSet", "RNAStringSet", "AAStringSet"))
## S4 method for signature 'FaFile,missing'
scanFa(file, param, ...,
    as=c("DNAStringSet", "RNAStringSet", "AAStringSet"))

## S4 method for signature 'FaFile'
getSeq(x, param, ...)
## S4 method for signature 'FaFileList'
getSeq(x, param, ...)

Arguments

con, object, x

An instance of FaFile or (for gzindex and getSeq) FaFileList.

file, index, gzindex

A character(1) vector of the fasta or fasta index or bgzip index file path (for FaFile), or an instance of class FaFile or FaFileList (for scanFaIndex, getSeq).

asNA

logical indicating if missing output should be NA or character()

param

An optional GRanges or IntegerRangesList instance to select reads (and sub-sequences) for input. See Methods, below.

...

Additional arguments.

  • For FaFileList, this can either be a single character vector of paths to BAM files, or several instances of FaFile objects.

  • For scanFa,FaFile,missing-method this can include arguemnts to readDNAStringSet / readRNAStringSet / readAAStringSet when param is ‘missing’.

rw

Mode of file; ignored.

as

A character(1) vector indicating the type of object to return.

  • For scanFaIndex, default GRangesList, with index information from each file is returned as an element of the list.

  • For scanFa, default DNAStringSet.

GRangesList, index information is collapsed across files into the unique index elements.

Objects from the Class

Objects are created by calls of the form FaFile().

Fields

The FaFile class inherits fields from the RsamtoolsFile class.

Functions and methods

FaFileList inherits methods from RsamtoolsFileList and SimpleList.

Opening / closing:

open.FaFile

Opens the (local or remote) path and index files. Returns a FaFile instance.

close.FaFile

Closes the FaFile con; returning (invisibly) the updated FaFile. The instance may be re-opened with open.FaFile.

Accessors:

path

Returns a character(1) vector of the fasta path name.

index

Returns a character(1) vector of fasta index name (minus the '.fai' extension).

Methods:

indexFa

Visit the path in path(file) and create an index file (with the extension ‘.fai’).

scanFaIndex

Read the sequence names and and widths of recorded in an indexed fasta file, returning the information as a GRanges object.

seqinfo

Consult the index file for defined sequences (seqlevels()) and lengths (seqlengths()).

countFa

Return the number of records in the fasta file.

scanFa

Return the sequences indicated by param as a DNAStringSet instance. seqnames(param) selects the sequences to return; start(param) and end{param} define the (1-based) region of the sequence to return. Values of end(param) greater than the width of the sequence cause an error; use seqlengths(FaFile(file)) to discover sequence widths. When param is missing, all records are selected. When length(param)==0 no records are selected.

getSeq

Returns the sequences indicated by param from the indexed fasta file(s) of file.

For the FaFile method, the return type is a DNAStringSet. The getSeq,FaFile and scanFa,FaFile,GRanges methods differ in that getSeq will reverse complement sequences selected from the minus strand.

For the FaFileList method, the param argument must be a GRangesList of the same length as file, creating a one-to-one mapping between the ith element of file and the ith element of param; the return type is a SimpleList of DNAStringSet instances, with elements of the list in the same order as the input elements.

show

Compactly display the object.

Author(s)

Martin Morgan

Examples


fl <- system.file("extdata", "ce2dict1.fa", package="Rsamtools",
                  mustWork=TRUE)
fa <- open(FaFile(fl))                   # open
countFa(fa)
(idx <- scanFaIndex(fa))
(dna <- scanFa(fa, param=idx[1:2]))
ranges(idx) <- narrow(ranges(idx), -10)  # last 10 nucleotides
(dna <- scanFa(fa, param=idx[1:2]))


Bioconductor/Rsamtools documentation built on Nov. 30, 2024, 8:15 a.m.