Description Usage Arguments Details Value Author(s) References Examples
The function implements the scan statistics method of Kabluchko and Spodarev (2009), Theorem 3.1.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | scanning(pos, freq, file, tuningUnits, alpha = 0.1, coarsening = 1,
minscans=0, maxscans = 0, sumscan = FALSE, perSNP = TRUE,
colname , n, threshold, collect=!old.def, old.def=FALSE,
max.intervals = length(alpha) * 100000,
max.basepair.distance = 50000, exclude.negative.at.boundary = TRUE,
maximum = TRUE, mean.freq, sd.freq, mean.n)
scan.statistics(file, tuningUnits, alpha=c(0.05, 0.01), repet=1000,
coarsening = 1,
minscans=0, maxscans=0, sumscan = FALSE, perSNP=TRUE,
colname, n, return.simu = FALSE,
debug = FALSE, formula = FALSE,
old.def=FALSE,
max.intervals = length(alpha) * 100000,
max.basepair.distance = 50000,
exclude.negative.at.boundary = TRUE,
pos, freq)
|
pos, freq |
alternatively to the |
file |
filename or list.
The rda file must contain the variables If the extension of the filename is ‘bed’, the behaviour of the programme is different, see the details |
tuningUnits |
real number. The value 0 codes the case
of Theorem 3.1 in Kabluchko and Spordarev (2009).
A positive value codes the case of Theorem 2.1 (which is very much
preferred). The case of Theorem 3.2 does not suit, hence is not
coded. Note that first, the frequencies are standardized.
Then |
alpha |
level(s) of testing. The levels should decrease. |
coarsening |
integer. If the value is larger than 1 then the
data are first windower'ed by |
repet |
The number of simulation to determine the
threshold(s) for testing in |
minscans,maxscans |
integers. The minimunm and maximum length of the window, respectively. If non-positive the window sizes are not restricted from below or above, respectively. |
sumscan |
logical.
If |
perSNP |
logical. If |
colname |
the column of the data frame that gives the relative
frequencies. The default name (i.e., if missing) is "HeterAB".
Alternatively In case the extension of the filename equals ‘bed’, the behaviour is different, see Details. |
n |
The number of individuals, the data are based on. Usually that number is determined automatically, but might be given for safety explicitely |
return.simu |
logical. to do |
debug |
logical or 2.
If not |
threshold |
A value around 0.8 seems to be appropriate for Christian's data whereas values around 18 are appropriate for Amanda's data. |
collect |
|
old.def |
logical. If Further, if Finally, if |
max.intervals |
[only if As the number of intervals is determined dynamically, the total number of significant intervals cannot be determined in advance. To economise a lot of copying, an upper threshold is given by the user. 100000 for each level should be large enough. If not, please contact the author. |
max.basepair.distance |
[only if |
exclude.negative.at.boundary |
logical. If |
maximum |
logical. MISSING DOC |
mean.freq |
If given, |
sd.freq |
If given, |
mean.n |
If given, |
formula |
if |
The ideas for the code are taken from Kabluchko and Spordarev (2009) although the values are not calculated from the respective theorems. Instead, values are obtained by simulation in a procedure similar to Bootstrapping.
In case the file is a bed-file, the following differences to the standard behaviour appears:
colname
must be of the form
c(pos=, freq=, n=)
with default value
c(pos=3, freq=4, n=5)
the sign of the frequency is changed
it is not checked whether the frequencies * n equals an integer number
scanning
returns invisibly a list that contains always
the input data
the number of intervals showing a total sum
larger than the given threshold
.
corresponding to alpha, if not given explicitely
the maximum value reached scanning over all windows
if collect=TRUE
then the list also contains
matrix of three rows containing information of all the
(overlapping) intervals where the sums exceeds the thresholds
.
Each interval is given by a column. First row: left end point of
the interval. Second row: right end point of the interval.
Third interval: maximum number of threshold that are passed.
the sums that correspond to the maxima in
areas
list of matrices.
For each threshold
, all the overlapping
intervals are joined that overlap, so that non-overlapping
intervals are finally obtained.
whether the null hypothesis is rejected
at the lowest alpha
level.
scan.statistics
returns invisibly a list containing all elements of
scanning
for collect=TRUE
.
Additionally, it contains
the maxima of repet
simulated data if
formula=FALSE
Martin Schlather, schlather@math.uni-mannheim.de
Kabluchko, Z. and Spodarev, E. (2009) Scan statistics of Levy noises and marked empirical processes. Adv. Appl. Probab. 41, 13-37.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
if (interactive()) {
n <- 30
loci <- 9000
positions <- 25000:15000000
} else {
n <- 3
loci <- 900
positions <- 2500:1500000
}
pos <- sort(sample(positions, loci))
freq <- rpois(loci, lambda=0.3) / n
alpha <- c(0.1, 0.05, 0.01)
s <- scan.statistics(n=n, pos=pos, freq=freq, repet=100,
tuningUnits=0.65, alpha=alpha)
str(s)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.