scanFa: Operations on indexed 'fasta' files.

Description Usage Arguments Value Author(s) References Examples

Description

Scan indexed fasta (or compressed fasta) files and their indicies.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
indexFa(file, ...)
## S4 method for signature 'character'
indexFa(file, ...)

scanFaIndex(file, ...)
## S4 method for signature 'character'
scanFaIndex(file, ...)

countFa(file, ...)
## S4 method for signature 'character'
countFa(file, ...)

scanFa(file, param, ...,
    as=c("DNAStringSet", "RNAStringSet", "AAStringSet"))
## S4 method for signature 'character,GRanges'
scanFa(file, param, ...,
    as=c("DNAStringSet", "RNAStringSet", "AAStringSet"))
## S4 method for signature 'character,IntegerRangesList'
scanFa(file, param, ...,
    as=c("DNAStringSet", "RNAStringSet", "AAStringSet"))
## S4 method for signature 'character,missing'
scanFa(file, param, ...,
    as=c("DNAStringSet", "RNAStringSet", "AAStringSet"))

Arguments

file

A character(1) vector containing the fasta file path.

param

An optional GRanges or IntegerRangesList instance to select reads (and sub-sequences) for input.

as

A character(1) vector indicating the type of object to return; default DNAStringSet.

...

Additional arguments, passed to readDNAStringSet / readRNAStringSet / readAAStringSet when param is ‘missing’.

Value

indexFa visits the path in file and create an index file at the same location but with extension ‘.fai’).

scanFaIndex reads the sequence names and and widths of recorded in an indexed fasta file, returning the information as a GRanges object.

countFa returns the number of records in the fasta file.

scanFa return the sequences indicated by param as a DNAStringSet, RNAStringSet, AAStringSet instance. seqnames(param) selects the sequences to return; start(param) and end{param} define the (1-based) region of the sequence to return. Values of end(param) greater than the width of the sequence are set to the width of the sequence. When param is missing, all records are selected. When param is GRanges(), no records are selected.

Author(s)

Martin Morgan <mtmorgan@fhcrc.org>.

References

http://samtools.sourceforge.net/ provides information on samtools.

Examples

1
2
3
4
5
6
7
fa <- system.file("extdata", "ce2dict1.fa", package="Rsamtools",
                  mustWork=TRUE)
countFa(fa)
(idx <- scanFaIndex(fa))
(dna <- scanFa(fa, idx[1:2]))
ranges(idx) <- narrow(ranges(idx), -10)  # last 10 nucleotides
(dna <- scanFa(fa, idx[1:2]))

Example output

Loading required package: GenomeInfoDb
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
    get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
    match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
    rbind, rownames, sapply, setdiff, sort, table, tapply, union,
    unique, unsplit, which, which.max, which.min

Loading required package: S4Vectors
Loading required package: stats4

Attaching package: 'S4Vectors'

The following objects are masked from 'package:base':

    colMeans, colSums, expand.grid, rowMeans, rowSums

Loading required package: IRanges
Loading required package: GenomicRanges
Loading required package: Biostrings
Loading required package: XVector
sh: 1: wc: Permission denied
sh: 1: cannot create /dev/null: Permission denied
'BiocParallel' did not register default BiocParallelParams:
  missing value where TRUE/FALSE needed
[1] 5
GRanges object with 5 ranges and 0 metadata columns:
       seqnames    ranges strand
          <Rle> <IRanges>  <Rle>
  [1] pattern01   [1, 18]      *
  [2] pattern02   [1, 25]      *
  [3] pattern03   [1, 24]      *
  [4] pattern04   [1, 24]      *
  [5] pattern05   [1, 25]      *
  -------
  seqinfo: 5 sequences from an unspecified genome
  A DNAStringSet instance of length 2
    width seq                                               names               
[1]    18 GCGAAACTAGGAGAGGCT                                pattern01
[2]    25 CTGTTAGCTAATTTTAAAAATAAAT                         pattern02
  A DNAStringSet instance of length 2
    width seq                                               names               
[1]    10 AGGAGAGGCT                                        pattern01
[2]    10 AAAAATAAAT                                        pattern02

Rsamtools documentation built on Nov. 8, 2020, 8:11 p.m.