faFromPACds: Extract sequences from a PACdataset.

faFromPACdsR Documentation

Extract sequences from a PACdataset.

Description

faFromPACds extracts many kinds of sequences from a PACdataset.

Usage

faFromPACds(
  PACds,
  bsgenome,
  what = "updn",
  fapre = NULL,
  byGrp = NULL,
  up = -300,
  dn = 100,
  chrCheck = TRUE
)

Arguments

PACds

a PACdataset.

bsgenome

BSgenome or FaFile object storing chromosome seqs, or a fasta file name

what

the value can be updn, pac, region, gene.

  • updn: use -up and -dn to define the upstream and downstream region around PAC's coord.

  • pac: output the range of PACs by PACds@anno$UPA_start~UPA_end.

  • region: output the genomic region sequence of PACs by PACds@ano$ftr_start, ftr_end.

  • gene: output the gene sequence by PACds@gene_start, gene_end.

fapre

a prefix for output file. If fapre=NULL, then return stringSet, but this is only valid when byGrp=NULL.

byGrp

to separately output sequences to different fa files. The value can be NULL / ftr / c('ftr','strand') / list(ftr=c('3UTR','5UTR'), strand=c('+'),'-').

up

paramter for what=updn, specifying the upstream region from the PAC.

dn

paramter for what=updn, specifying the downstream region from the PAC. PAC is the 0 position. e.g., up=-300, dn=100 to subset 401 nt (PAC is the 0 or 301 position, upstream 1..300 [or -300..-1], PA301 [or 0], downstream 302..401 [or 1..100]) e.g., up=0, dn=0, will output the nucleotide at the PAC position.

chrCheck

if TRUE, then all chr in PACds should be in bsgenome, otherwise will ignore those non-consistent chr rows in PACds.

Details

This function can export the sequences surrounding PACs, the sequences of genomic regions the PACs located, and the gene sequences. If export sequences of regions/genes/etc., only one region seuqence will be exported.

Value

File names or a stringSet. If up=-300, dn=100, then the output sequence is 401nt and PAC position is the 301st.

See Also

Other APA signal functions: annotateByPAS(), getVarGrams(), kcount(), plotATCGforFAfile(), plotSeqLogo()

Examples

library("BSgenome.Athaliana.TAIR.TAIR9")
bsgenome <-Athaliana
pacds=makeExamplePACds()
## Get sequences of PAC ranges.
faFromPACds(pacds, bsgenome, what='pac', fapre='pac')

## bsgenome is a fasta file
fapath <- 'Arab_TAIR9_chr_all.fas'
faFromPACds(pacds, bsgenome=fapath, what='updn', fapre=NULL,
            byGrp=NULL, up=-300, dn=100, chrCheck=TRUE)

## Get upstream 300nt and downstream 100nt sequences around PACds.
faFromPACds(pacds, bsgenome, what='updn', fapre='updn', up=-300, dn=100)
## Get PAC sequences and output by different genomic regions, e.g.,
## 3UTR PACs, intron PACs...
faFromPACds(pacds, bsgenome, what='updn', fapre='updn',
            up=-300, dn=100, byGrp='ftr')
faFromPACds(pacds, bsgenome, what='updn', fapre='updn',
            up=-300, dn=100,
            byGrp=list(ftr='3UTR'))
faFromPACds(pacds, bsgenome, what='updn', fapre='updn', up=-300, dn=100,
            byGrp=list(ftr='3UTR', strand=c('+','-')))
## Get sequences for genes with PACs.
faFromPACds(pacds, bsgenome, what='region', fapre='region', byGrp='ftr')

BMILAB/movAPA documentation built on Jan. 3, 2024, 11:09 p.m.