blatSeqs: Align sequences using BLAT.

Description Usage Arguments Value See Also Examples

View source: R/hiReadsProcessor.R

Description

Align batch of sequences using standalone BLAT or gfServer/gfClient protocol against an indexed reference genome. Depending on parameters provided, the function either aligns batch of files to a reference genome using gfClient or takes sequences from query & subject parameters and aligns them using standalone BLAT. If standaloneBlat=FALSE and gfServer is not launched apriori, this function will start one using startgfServer and kill it using stopgfServer upon successful execution.

Usage

1
2
3
4
5
blatSeqs(query = NULL, subject = NULL, standaloneBlat = TRUE,
  port = 5560, host = "localhost", parallel = TRUE, numServers = 1L,
  gzipResults = TRUE, blatParameters = c(minIdentity = 90, minScore = 10,
  stepSize = 5, tileSize = 10, repMatch = 112312, dots = 50, maxDnaHits = 10, q
  = "dna", t = "dna", out = "psl"))

Arguments

query

an object of DNAStringSet, a character vector of filename(s), or a path/pattern of fasta files to BLAT. Default is NULL.

subject

an object of DNAStringSet, a character vector, or a path to an indexed genome (nibs,2bits) to serve as a reference or target to the query. Default is NULL. If the subject is a path to a nib or 2bit file, then standaloneBlat will not work!

standaloneBlat

use standalone BLAT as suppose to gfServer/gfClient protocol. Default is TRUE.

port

the same number you started the gfServer with. Required if standaloneBlat=FALSE. Default is 5560.

host

name of the machine running gfServer. Default is 'localhost' and only used when standaloneBlat=FALSE.

parallel

use parallel backend to perform calculation with BiocParallel. Defaults to TRUE. If no parallel backend is registered, then a serial version is ran using SerialParam.

numServers

launch >1 gfServer and load balance jobs? This only applies when parallel=TRUE and standaloneBlat=FALSE. Enable this option only if the machine has a lot of RAM! Option ignored if launched gfServer is found at specified host and port. Default is 1.

gzipResults

gzip the output files? Default is TRUE.

blatParameters

a character vector of options to be passed to gfClient/BLAT command except for 'nohead' option. Default: c(minIdentity=90, minScore=10, stepSize=5, tileSize=10, repMatch=112312, dots=50, maxDnaHits=10, q="dna", t="dna", out="psl"). Be sure to only pass parameters accepted by either BLAT or gfClient. For example, if repMatch or stepSize parameters are specified when using gfClient, then the function will simply ignore them! The defaults are configured to align a 19bp sequence with 90% identity.

Value

a character vector of psl filenames. Each file provided is split by number of parallel workers and with read number denoting the cut. Files are cut in smaller pieces to for the ease of read & write into a single R session.

See Also

pairwiseAlignSeqs, vpairwiseAlignSeqs, startgfServer, stopgfServer, read.psl, splitSeqsToFiles, read.blast8

Examples

1
2
3
4
5
6
7
 

blatSeqs(dnaSeqs, subjectSeqs, blatParameters=c(minIdentity=90, minScore=10, 
tileSize=10, dots=10, q="dna", t="dna", out="blast8"))
blatSeqs(dnaSeqs, "/usr/local/genomeIndex/hg18.2bit", standaloneBlat=FALSE)
blatSeqs("mySeqs.fa", "/usr/local/genomeIndex/hg18.2bit", standaloneBlat=FALSE)
blatSeqs("my.*.fa", "/usr/local/genomeIndex/hg18.2bit", standaloneBlat=FALSE)

malnirav/hiReadsProcessor documentation built on Sept. 17, 2017, 10:56 a.m.