MEPmod: Predict MicroExons in a Plant Genome

View source: R/MEPmod.R

MEPmodR Documentation

Predict MicroExons in a Plant Genome

Description

MEPmod searches microexon-tags in 45 conserved microexon clusters in plants using gapped Position Weight Matrix (PWM). The only input file is plant genomic sequences or a plant genome. The sequences on both plus and minus strands will be scanned.

Usage

MEPmod(
  genome,
  min.score = "80%",
  clusters = seq_len(45),
  include.intronLoss = TRUE,
  span = 20000,
  min.intron = 20,
  max.intron = 10000,
  cores = 4
)

Arguments

genome

the path(s) to the fasta file(s) or a 'DNAStringSet' object. Any contig sequence < 10 kb will be excluded.

min.score

a character string containing a percentage specifying the minimum score of each exon block (e.g. "80%"). This parameter will pass to matchPWM from Biostrings package.

clusters

an integer vector between 1 to 45 specifying microexon-tag clusters to search (default: all the 45 clusters)

include.intronLoss

TRUE or FALSE. If TRUE, the microexons with any side of flanking intron loss will also be returned.

span

the maximum spanning region of the microexon-tag (default: 20 kb).

min.intron

minimum intron size

max.intron

maximum intron size

cores

number of cores to use. This parameter will pass to mclapply as *mc.cores* from parallel package.

Value

A GRanges of microexons with 8 metadata columns:

  • NT, 108 bp DNA sequence of microexon-tag (72 bp for cluster 1 and 2).

  • AA, 36 aa protein sequence translated from NT (24 aa for cluster 1 and 2).

  • region, the span region of microexon-tag.

  • block.starts, the genomic coordinates of block start positions (exon, intron, exon, ...).

  • block.sizes, block sizes (exon, intron, exon, ...).

  • score, match score of the microexon-tag. Large score indicate high confidence.

  • isMicroExon, if the microexon has any flanking intron loss, FALSE; otherwise, TRUE.

  • cluster, which cluster this microexon-tag belongs to.

Examples

suppressPackageStartupMessages(library(BSgenome))
if (!requireNamespace("BSgenome.Athaliana.TAIR.TAIR9", quietly = TRUE))
    BiocManager::install("BSgenome.Athaliana.TAIR.TAIR9")
library(BSgenome.Athaliana.TAIR.TAIR9)

Chr1<-BSgenome.Athaliana.TAIR.TAIR9[["Chr1"]]
Chr1<-DNAStringSet(Chr1)
names(Chr1)<-'Chr1'
x<-subseq(Chr1, 1, 1e6)
x

mep<-MEPmod(genome=x)

mep
# GRanges object with 2 ranges and 8 metadata columns:
#       seqnames        ranges strand |                      NT                      AA
#          <Rle>     <IRanges>  <Rle> |          <DNAStringSet>           <AAStringSet>
#   [1]     Chr1 456757-456761      + | TTTGATGCTA...ACTGTCTGAC FDARTAWSQC...AFGAVESLSD
#   [2]     Chr1 601547-601560      - | AAAGTTGAAT...ACATTCAGAT KVEFKDNEWK...RLDCLLKHSD
#              region             block.starts   block.sizes     score isMicroExon   cluster
#           <IRanges>            <IntegerList> <IntegerList> <numeric>   <logical> <integer>
#   [1] 456595-456893 456595,456647,456757,...  52,110,5,...    0.9800        TRUE         8
#   [2] 601327-601700 601653,601561,601547,...  48,92,14,...    0.9241        TRUE        35
#   -------
#   seqinfo: 1 sequence from an unspecified genome; no seqlengths


yuhuihui2011/MEPmodeler documentation built on Oct. 12, 2023, 2:19 p.m.