Find all Open Reading Frames (ORFs) on the simple input sequences
in ONLY 5'- 3' direction (+), but within all three possible reading frames.
Do not use findORFs for mapping to full chromosomes,
For each sequence of the input vector
IRanges with START and
STOP positions (inclusive) will be returned as
IRangesList. Returned coordinates are relative to the
1 2 3 4 5 6 7
(DNAStringSet or character vector) - DNA/RNA sequences to search
for Open Reading Frames. Can be both uppercase or lowercase. Easiest call to
get seqs if you want only regions from a fasta/fasta index pair is:
seqs = ORFik:::txSeqsFromFa(grl, faFile), where grl is a GRanges/List of
search regions and faFile is a
(character vector) Possible START codons to search for.
(character vector) Possible STOP codons to search for.
(logical) Default TRUE. Keep only the longest ORF per
unique stopcodon: (seqname, strand, stopcodon) combination, Note: Not longest
per transcript! You can also use function
(integer) Default is 0. Which is START + STOP = 6 bp. Minimum length of ORF, without counting 3bps for START and STOP codons. For example minimumLength = 8 will result in size of ORFs to be at least START + 8*3 (bp) + STOP = 30 bases. Use this param to restrict search.
If you want antisence strand too, do:
pos <- findORFs(seqs)
#negative strands (DNAStringSet only if character)
neg <- findORFs(reverseComplement(DNAStringSet(seqs)))
relist(c(GRanges(pos, strand = "+"), GRanges(neg, strand = "-")),
skeleton = merge(pos, neg))
(IRangesList) of ORFs locations by START and STOP sites grouped by input sequences. In a list of sequences, only the indices of the sequences that had ORFs will be returned, e.g. 3 sequences where only 1 and 3 has ORFs, will return size 2 IRangesList with names c("1", "3"). If there are a total of 0 ORFs, an empty IRangesList will be returned.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
## Simple examples findORFs("ATGTAA") findORFs("ATGTTAA") # not in frame anymore findORFs("ATGATGTAA") # only longest of two above findORFs("ATGATGTAA", longestORF = FALSE) # two ORFs findORFs(c("ATGTAA", "ATGATGTAA")) ## Get DNA sequences from ORFs seq <- DNAStringSet(c("ATGTAA", "AAA", "ATGATGTAA")) names(seq) <- c("tx1", "tx2", "tx3") orfs <- findORFs(seq, longestORF = FALSE) # you can get sequences like this: gr <- unlist(orfs, use.names = TRUE) gr <- GRanges(seqnames = names(seq)[as.integer(names(gr))], ranges(gr), strand = "+") # Give them some proper names: names(gr) <- paste0("ORF_", seq.int(length(gr)), "_", seqnames(gr)) orf_seqs <- getSeq(seq, gr) orf_seqs # Convert to DNA DNAStringSet and Save as .fasta # writeXStringSet(orf_seqs, "orfs.fasta") ## Reading from file and find ORFs
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.