findIntegrations: Find the integration sites and add results to SampleInfo...

Description Usage Arguments Value Note See Also Examples

View source: R/hiReadsProcessor.R

Description

Given a SampleInfo object, the function finds integration sites for each sample using their respective settings and adds the results back to the object. This is an all-in-one function which aligns, finds best hit per read per sample, cluster sites, and assign ISU IDs. It calls blatSeqs, read.psl, getIntegrationSites, clusterSites, otuSites. here must be linkered reads within the sampleInfo object in order to use this function using the default parameters. If you are planning on BLATing non-linkered reads, then change the seqType to one of accepted options for the 'feature' parameter of extractSeqs, except for '!' based features.

Usage

1
2
3
findIntegrations(sampleInfo, seqType = NULL, genomeIndices = NULL,
  samplenames = NULL, parallel = TRUE, autoOptimize = FALSE,
  doSonic = FALSE, doISU = FALSE, ...)

Arguments

sampleInfo

sample information SimpleList object outputted from findLinkers, which holds decoded, primed, LTRed, and Linkered sequences for samples per sector/quadrant along with metadata.

seqType

which type of sequence to align and find integration sites. Default is NULL and determined automatically based on type of restriction enzyme or isolation method used. If restriction enzyme is Fragmentase, MuA, Sonication, or Sheared then this parameter is set to genomicLinkered, else it is genomic. Any one of following options are valid: genomic, genomicLinkered, decoded, primed, LTRed, linkered.

genomeIndices

an associative character vector of freeze to full or relative path of respective of indexed genomes from BLAT(.nib or .2bit files). For example: c("hg18"="/usr/local/blatSuite34/hg18.2bit", "mm8"="/usr/local/blatSuite34/mm8.2bit"). Be sure to supply an index per freeze supplied in the sampleInfo object. Default is NULL.

samplenames

a vector of samplenames to process. Default is NULL, which processes all samples from sampleInfo object.

parallel

use parallel backend to perform calculation with BiocParallel. Defaults to TRUE. If no parallel backend is registered, then a serial version is ran using SerialParam.

autoOptimize

if aligner='BLAT', then should the blatParameters be automatically optimized based on the reads? Default is FALSE. When TRUE, following parameters are adjusted within the supplied blatParameters vector: stepSize, tileSize, minScore, minIdentity. This parameter is useful when aligning reads of various lengths to the genome. Optimization is done using only read lengths. In beta phase!

doSonic

calculate integration sites abundance using breakpoints. See getSonicAbund for more details. Default is FALSE.

doISU

calculate integration site unit for multihits. See isuSites for more details. Default is FALSE.

...

additional parameters to be passed to blatSeqs.

Value

a SimpleList object similar to sampleInfo parameter supplied with new data added under each sector and sample. New data attributes include: psl, and sites. The psl attributes holds the genomic hits per read along with QC information. The sites attribute holds the condensed integration sites where genomic hits have been clustered by the Position column and cherry picked to have each site pass all the QC steps.

Note

If parallel=TRUE, then be sure to have a parallel backend registered before running the function. One can use any of the following MulticoreParam SnowParam

See Also

findPrimers, findLTRs, findLinkers, startgfServer, read.psl, blatSeqs, blatListedSet, pslToRangedObject, clusterSites, isuSites, crossOverCheck, getIntegrationSites, getSonicAbund, annotateSites

Examples

1
2
3
4
5
6
7
 

load(file.path(system.file("data", package = "hiReadsProcessor"),
"FLX_seqProps.RData"))
findIntegrations(seqProps, 
genomeIndices=c("hg18"="/usr/local/genomeIndexes/hg18.noRandom.2bit"), 
numServers=2)

hiReadsProcessor documentation built on Nov. 8, 2020, 5:43 p.m.