bedtools_getfasta: bedtools_getfasta

View source: R/getfasta.R

bedtools_getfastaR Documentation



Query sequence from a FASTA file given a set of ranges, including compound regions like transcripts and junction reads. This assumes the sequence is DNA.


    bedtools_getfasta(cmd = "--help")
    R_bedtools_getfasta(fi, bed, s = FALSE, split = FALSE)
    do_bedtools_getfasta(fi, bed, s = FALSE, split = FALSE)



String of bedtools command line arguments, as they would be entered at the shell. There are a few incompatibilities between the docopt parser and the bedtools style. See argument parsing.


Path to a FASTA file, or an XStringSet object.


Path to a BAM/BED/GFF/VCF/etc file, a BED stream, a file object, or a ranged data structure, such as a GRanges, as the query. Use "stdin" for input from another process (presumably while running via Rscript). For streaming from a subprocess, prefix the command string with “<”, e.g., "<grep foo file.bed". Any streamed data is assumed to be in BED format.


Force strandedness. If the feature occupies the antisense strand, the sequence will be reverse complemented.


Given BED12 or BAM input, extract and concatenate the sequences from the blocks (e.g., exons).


As with all commands, there are three interfaces to the getfasta command:


Parses the bedtools command line and compiles it to the equivalent R code.


Accepts R arguments corresponding to the command line arguments and compiles the equivalent R code.


Evaluates the result of R_bedtools_getfasta. Recommended only for demonstration and testing. It is best to integrate the compiled code into an R script, after studying it.

It is recommended to retrieve reference sequence using a BSgenome package, either custom or provided by Bioconductor. Call getSeq to query for specific regions of the BSgenome object. If one must access a file, consider converting it to 2bit or FA (razip) format for indexed access using import and its which argument.

But if one must access a FASTA file, we need to read all of it with readDNAStringSet and extract regions using x[gr], where gr is a GRanges or GRangesList.


A language object containing the compiled R code, evaluating to a DNAStringSet object.


Michael Lawrence


See Also

getSeq, the primary sequence query interface.


## Not run: 
setwd(system.file("unitTests", "data", "getfasta", package="HelloRanges"))

## End(Not run)
    ## simple query
    bedtools_getfasta("--fi t.fa -bed blocks.bed")
    ## get spliced transcript/read sequence
    bedtools_getfasta("--fi t.fa -bed blocks.bed -split")

lawremi/HelloRanges documentation built on April 20, 2022, 5:40 p.m.