genRandBioSeqs: Generate Random Biological Sequences

Description Usage Arguments Details Value Author(s) References Examples

View source: R/utils.R

Description

Generate biological sequences with uniform random distribution of alphabet characters.

Usage

1
2
genRandBioSeqs(seqType = c("DNA", "RNA", "AA"), numSequences, seqLength,
  biostring = TRUE, seed)

Arguments

seqType

defines the type of sequence as DNA, RNA or AA and the underlying alphabet. Default="DNA"

numSequences

single numeric value which specifies the number of sequences that should be generated.

seqLength

either a single numeric value or a numeric vector of length 'numSequences' which gives the length of the sequences to be generated.

biostring

if TRUE the sequences will be generated in XStringSet format otherwise as BioVector derived class. Default=TRUE

seed

when present the random generator will be seeded with the value passed in this parameter

Details

The function generates a set of sequences with uniform distribution of alphabet characters and returns it as XStringSet or BioVector dependent on the parameter biostring.

Value

When the parameter 'biostring' is set to FALSE the function returns a XStringSet derived class otherwise a BioVector derived class.

Author(s)

Johannes Palme <kebabs@bioinf.jku.at>

References

http://www.bioinf.jku.at/software/kebabs

J. Palme, S. Hochreiter, and U. Bodenhofer (2015) KeBABS: an R package for kernel-based analysis of biological sequences. Bioinformatics, 31(15):2574-2576, 2015. DOI: 10.1093/bioinformatics/btv176.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
## generate a set of AA sequences of fixed length as AAStringSet
aaseqs <- genRandBioSeqs("AA", 100, 1000, biostring=TRUE)

## show AA sequence set
aaseqs

## Not run: 
## generate a set of "DNA" sequences as DNAStringSet with uniformly
## distributed lengths between 1500 and 3000 bases
seqLength <- runif(300, min=1500, max=3500)
dnaseqs <- genRandBioSeqs("DNA", 100, seqLength, biostring=TRUE)

## show DNA sequence set
dnaseqs

## End(Not run)

Example output

Loading required package: Biostrings
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, sd, var, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colMeans, colSums, colnames, do.call,
    duplicated, eval, evalq, get, grep, grepl, intersect, is.unsorted,
    lapply, lengths, mapply, match, mget, order, paste, pmax, pmax.int,
    pmin, pmin.int, rank, rbind, rowMeans, rowSums, rownames, sapply,
    setdiff, sort, table, tapply, union, unique, unsplit, which,
    which.max, which.min

Loading required package: S4Vectors
Loading required package: stats4

Attaching package: 'S4Vectors'

The following object is masked from 'package:base':

    expand.grid

Loading required package: IRanges
Loading required package: XVector

Attaching package: 'Biostrings'

The following object is masked from 'package:base':

    strsplit

Loading required package: kernlab

Attaching package: 'kernlab'

The following object is masked from 'package:Biostrings':

    type

  A AAStringSet instance of length 100
      width seq
  [1]  1000 SPAPTACFSELWYWLTHHQTLWSQQHYERECPH...QWLDQPVHRNLFWENIUKLWWVLWUUAUEKKY
  [2]  1000 VDAECVQTPNKYSYMFHPUNDIHVPRNYAYIIN...DYUHTSWGIPGTIFDFYQHDIWLMWRUMYFQH
  [3]  1000 EIUUGILYYVIQLIVIGPQVDQITWYADTUQER...HGACUQTCMTLTAASVKLQNLFWWSGFSIKIV
  [4]  1000 YKDYKAEIURAWNYPFSUMIPVUFWKGSKRGWE...DKDTGUNFNHHYACIWAHFRMYVIMVNIMPID
  [5]  1000 MFGNPULKQKLSUVSSQTCVQUMDUKWAKGVAQ...GFAHWSTNYUHWVHIEMNENHARLUVSQPGQW
  ...   ... ...
 [96]  1000 LLNCCWLGWYMYTVUAITSPGFRHGVKHFVPAE...HUEISGFUIFMKHYTSRVQTWWGSGKTYIVHW
 [97]  1000 CVNKHDVNHEYKULVMRFRSRNPEDLAHEMGNE...QMDAEVFKEPHMYUKVVTTIPHRIFIVYKAHC
 [98]  1000 IPHVYTSWSEVCIHYQCPFQMDCILWQQUYATL...ITCTCHFYLHKNCGPCIKFSTVCWIGKKKYHQ
 [99]  1000 MAHFYGFRWPLGFRIGPLPTITEPAYSHFPWDY...PTHAAYCYULPCIPSCHPUGGLHVNGPYWUVT
[100]  1000 NFCAFRGHIIDSWCAARQKKESCLSDCALHIFA...WRUWLVWQEQWKHUTYMIMPNCLQVUCDQWGT
  A DNAStringSet instance of length 100
      width seq
  [1]  2085 TCCGGGCAGGCCTGCCGTGTGGCGATCAAGTGG...AAAAATCAAATCTATATTACTTTACTTGGCAG
  [2]  2714 CAACCAGCGGTAACTTAAGAAAGTCGACATGCT...CGTCCTCTAATTTCTTAATGGACTCTGACACA
  [3]  1512 CCCGTTGGGCCTAAGAGTGGGCAAGGAACAGAT...CCTCCAGGGTCCAAATAGTTCTGTGGGGTCCA
  [4]  2826 GGGCCAGGTTGCTGGCTAGTCTCCGACACTGTA...CGAGACGGACCCATGTAGGCGAGTTGTAATTT
  [5]  2709 GGTTAACAACCGAGAATTACCGTGGGATGACCG...TAATCATTCGCATCGGAGACGGTAACTGTCCG
  ...   ... ...
 [96]  3360 CATCCACTGCATATCACCTTCTCAAAAGCCTCA...TTAGCTCGGCACGTGTTTGCTGTTTAGTGCTA
 [97]  2399 CAATAACGATCTAACGAGCCTGGGGAGCCCATT...GCGCGGGCTCATGGCTAAGATTCGCGAGGTTT
 [98]  2137 GGCCGCTGAACGGGCCTATTTGCCGCCGGAATG...CAACCTCAACACATCCTAGGTCAAACGACCGT
 [99]  3152 GGTCTGTGGCTTCAGCCAAGGCTTGGAACAGCA...CCTGACTTACATTGACTCCCATAAAAGTCTCC
[100]  2308 CGGACAACTTGGGGGACTGACTGCGGCACTTTA...ATCAACGGTTCTGGCAGGGGAGTCATCGTGGA

kebabs documentation built on Nov. 8, 2020, 7:38 p.m.