Description Usage Arguments Value Note Author(s) See Also Examples
Given a group of discrete factors (i.e. position ids) and integer values, the function tries to correct/cluster the integer values based on their frequency in a defined windowsize.
1 2 3 
posID 
a vector of groupings for the value parameter (i.e. Chr,strand). Required if psl.rd parameter is not defined. 
value 
a vector of integer with values that needs to corrected or clustered (i.e. Positions). Required if psl.rd parameter is not defined. 
grouping 
additional vector of grouping of length posID or psl.rd by which to pool the rows (i.e. samplenames). Default is NULL. 
psl.rd 
a GRanges object returned from 
weight 
a numeric vector of weights to use when calculating frequency of value by posID and grouping if specified. Default is NULL. 
windowSize 
size of window within which values should be corrected or clustered. Default is 5. 
byQuartile 
flag denoting whether quartile based technique should be employed. See notes for details. Default is TRUE. 
quartile 
if byQuartile=TRUE, then the quartile which serves as the threshold. Default is 0.70. 
parallel 
use parallel backend to perform calculation with

sonicAbund 
calculate breakpoint abundance using

a data frame with clusteredValues and frequency shown alongside with the original input. If psl.rd parameter is defined then a GRanges object is returned with three new columns appended at the end: clusteredPosition, clonecount, and clusterTopHit (a representative for a given cluster chosen by best scoring hit!).
The algorithm for clustering when byQuartile=TRUE is as follows: for all values in each grouping, get a distribution and test if their frequency is >= quartile threshold. For values below the quartile threshold, test if any values overlap with the ones that passed the threshold and is within the defined windowSize. If there is a match, then merge with higher value, else leave it as is. This is only useful if the distribution is wide and polynodal. When byQuartile=FALSE, for each group the values within the defined window are merged with the next highest frequently occuring value, if freuquencies are tied then lowest value is used to represent the cluster. When psl.rd is passed, then multihits are ignored and only unique sites are clustered. All multihits will be tagged as a good 'clusterTopHit'.
Nirav Malani
findIntegrations
, getIntegrationSites
,
otuSites
, isuSites
, crossOverCheck
,
pslToRangedObject
, getSonicAbund
1 2 3 4 5 6 7 8  .clusterSites(posID=c('chr1','chr1','chr1','chr2+','chr15',
'chr16','chr11'), value=c(rep(1000,2),5832,1000,12324,65738,928042),
grouping=c('a','a','a','b','b','b','c'))
data(psl)
psl < psl[sample(nrow(psl),100),]
psl.rd < getIntegrationSites(pslToRangedObject(psl))
psl.rd$grouping < sub("(.+).+","\\1",psl.rd$qName)
.clusterSites(grouping=psl.rd$grouping, psl.rd=psl.rd)

Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.