findIntegrations: Find the integration sites and add results to SampleInfo...
In hiReadsProcessor: Functions to process LM-PCR reads from 454/Illumina data

Description Usage Arguments Value Note See Also Examples

Given a SampleInfo object, the function finds integration sites for each sample using their respective settings and adds the results back to the object. This is an all-in-one function which aligns, finds best hit per read per sample, cluster sites, and assign ISU IDs. It calls blatSeqs, read.psl, getIntegrationSites, clusterSites, otuSites. here must be linkered reads within the sampleInfo object in order to use this function using the default parameters. If you are planning on BLATing non-linkered reads, then change the seqType to one of accepted options for the 'feature' parameter of extractSeqs, except for '!' based features.

1
2
3

findIntegrations(sampleInfo, seqType = NULL, genomeIndices = NULL,
  samplenames = NULL, parallel = TRUE, autoOptimize = FALSE,
  doSonic = FALSE, doISU = FALSE, ...)

`sampleInfo`	sample information SimpleList object outputted from `findLinkers`, which holds decoded, primed, LTRed, and Linkered sequences for samples per sector/quadrant along with metadata.
`seqType`	which type of sequence to align and find integration sites. Default is NULL and determined automatically based on type of restriction enzyme or isolation method used. If restriction enzyme is Fragmentase, MuA, Sonication, or Sheared then this parameter is set to genomicLinkered, else it is genomic. Any one of following options are valid: genomic, genomicLinkered, decoded, primed, LTRed, linkered.
`genomeIndices`	an associative character vector of freeze to full or relative path of respective of indexed genomes from BLAT(.nib or .2bit files). For example: c("hg18"="/usr/local/blatSuite34/hg18.2bit", "mm8"="/usr/local/blatSuite34/mm8.2bit"). Be sure to supply an index per freeze supplied in the sampleInfo object. Default is NULL.
`samplenames`	a vector of samplenames to process. Default is NULL, which processes all samples from sampleInfo object.
`parallel`	use parallel backend to perform calculation with `BiocParallel`. Defaults to TRUE. If no parallel backend is registered, then a serial version is ran using `SerialParam`.
`autoOptimize`	if aligner='BLAT', then should the blatParameters be automatically optimized based on the reads? Default is FALSE. When TRUE, following parameters are adjusted within the supplied blatParameters vector: stepSize, tileSize, minScore, minIdentity. This parameter is useful when aligning reads of various lengths to the genome. Optimization is done using only read lengths. In beta phase!
`doSonic`	calculate integration sites abundance using breakpoints. See `getSonicAbund` for more details. Default is FALSE.
`doISU`	calculate integration site unit for multihits. See `isuSites` for more details. Default is FALSE.
`...`	additional parameters to be passed to `blatSeqs`.

a SimpleList object similar to sampleInfo parameter supplied with new data added under each sector and sample. New data attributes include: psl, and sites. The psl attributes holds the genomic hits per read along with QC information. The sites attribute holds the condensed integration sites where genomic hits have been clustered by the Position column and cherry picked to have each site pass all the QC steps.

If parallel=TRUE, then be sure to have a parallel backend registered before running the function. One can use any of the following MulticoreParam SnowParam

findPrimers, findLTRs, findLinkers, startgfServer, read.psl, blatSeqs, blatListedSet, pslToRangedObject, clusterSites, isuSites, crossOverCheck, getIntegrationSites, getSonicAbund, annotateSites

 

load(file.path(system.file("data", package = "hiReadsProcessor"),
"FLX_seqProps.RData"))
findIntegrations(seqProps, 
genomeIndices=c("hg18"="/usr/local/genomeIndexes/hg18.noRandom.2bit"), 
numServers=2)