Cores of Recurrent Events
Description
Given a collection of intervals s_1,...,s_N, find K intervals c_1,...,c_K which approximately minimize Sum_i Prod_k (1E(s_i,c_k)), where E(s_i,c_k) is a geometric measure of association between s_i and c_k. Perform permutation tests to estimate the significance of finding.
Usage
1 2 3 4 5  CORE(dataIn, keep = NULL, startcol = "start", endcol = "end",
chromcol = "chrom", weightcol = "weight", maxmark = 1, minscore = 0,
pow = 1, assoc = c("I", "J", "P"), nshuffle = 0, boundaries = NULL,
seedme = sample(1e+08, 1), shufflemethod = c("SIMPLE", "RESCALE"),
tiny = 1, distrib = c("vanilla", "Rparallel","Grid"), njobs = 1,qmem=NA)

Arguments
dataIn 
A matrix, a data frame or an object of class "CORE". If 
keep 
A character vector. If 
startcol 
A character string. If 
endcol 
A character string. If 
chromcol 
A character string. If 
weightcol 
A character string. If 
maxmark 
An integer for the maximal number of cores to be computed. The actual number
of cores to be computed is the smaller of 
minscore 
A single numeric value for the minimal allowed score of the cores to be reported. 
pow 
A single numeric value of at least 1 for the power parameter used in computing the association measure beween the cores and the input intervals (see Details). 
assoc 
A character specifying the type of association measure to be used (see Details). 
nshuffle 
An integer specifying the number of randomizations to be performed for estimating significance. 
boundaries 
A matrix or a data frame that must have three columns whose names are given by

seedme 
An integer specifying the random number generator seed (see Details). 
shufflemethod 
A character string specifying the event randomization method used for estimation of significance. If "SIMPLE" (default), each event is placed at random with equal probability for any position where it can fit within chromosome boundaries. If "RESCALE", each event is placed at random in a randomly chosen chromosome, and the event length is multiplied by the length ratio of the new to the original chromosome. 
tiny 
A single numeric value specifying the weight below which events are removed from the input event set. 
distrib 
A character string specifying the method of distributed computing used for
estimation of significance. If "vanilla" (default), no distributed computing
is performed. If "Rparallel", parallel computation with the local machine
is performed using functions from CRAN core package parallel, with
the number of worker processes being the smaller number of 
njobs 
If distributed computing is used for estimation of significance, a single integer specifying the desired number of worker processes. 
qmem 
A character string that can customize grid engine 
Details
The three measures of association specified by assoc
are defined as
follows ( denotes the length of an interval). For "I" (inclusion)
E(s_i,c_k) = (c_k/s_i)^pow if c_k is contained in s_i and 0 otherwise.
For "J" (Jaccard) E(s_i,c_k) = J(s_i,c_k)^pow, where J is the Jaccard index.
For "P" (piercing) E(s_i,c_k) = 1 if c_k is contained and 0 otherwise.
In all cases the left (right) boundary of an optimal c_k is one of the left
(right) boundaries in the set of input interval events. In addition, there
are no event interval boundaries in the interior of an optimal c_k in case "P".
The boundaries
argument is used for assessing statistical significance
of the solution. If boundaries
is not specified, the chromosome
boundaries for each chromosome are taken to be the leftmost left and the
rightmost right boundaries of all events in the chromosome.
If significance of finding is estimated, the random number generator stream,
and hence the resultant estimate, only depends on seedme
and is
independent of the parallelization option chosen.
Value
An object of class "CORE" with the following items.
input 
A matrix with four columns called "chrom", "start", "end" and "weight", specifying the input interval events. 
call 
A character string specifying the function call. 
coreTable 
A matrix with columns named "start", "end" and "score", for start and end positions and CORE scores of the cores found by the algorithm. 
seedme 
If significance estimate was performed, the random number generator seed. 
assoc 
One of "I", "J" or "P", indicating the geometric measure of association used. 
shufflemethod 
One of "SIMPLE" or "RESCALE", indicating the randomization method used. 
p 
A numeric vector of the length equal to the row dimension of 
simscores 
A matrix with the row dimension equal to that of 
minscore 
A single numeric value for the minimal score of the reported cores. 
maxmark 
A single numeric value for the requested maximal number of cores to be computed. 
tiny 
A single numeric value for the weight below which events were removed from the input set. 
pow 
A single numeric value for the power used in computing the association measures. 
boundaries 
A matrix with three columns named "chrom", "start" and "end", indicating chromosome numbers and boundary positions used for estimation of significance. 
Author(s)
Alex Krasnitz,Guoli Sun
Examples
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17  #Compute 3 cores and perform no randomization
#(meaningless for estimate of significance).
data(testInputCORE)
data(testInputBoundaries)
myCOREobj<CORE(dataIn=testInputCORE,maxmark=3,nshuffle=0,
boundaries=testInputBoundaries,seedme=123)
## Not run:
#Extend this computation to a much larger number of randomizations,
#using 2 cores of a host computer.
newCOREobj<CORE(dataIn=myCOREobj,keep=c("maxmark","seedme","boundaries"),
nshuffle=20,distrib="Rparallel",njobs=2)
#When using "Grid", make sure you have write premission to the current
#work space.
newCOREobj<CORE(dataIn=myCOREobj,keep=c("maxmark","seedme","boundaries"),
nshuffle=20,distrib="Grid",njobs=2)
## End(Not run)
