autoSquash | R Documentation |
autoSquash
squashes data by calling squashData
once for
each count (N), removing the need to repeatedly squash the same data
set.
autoSquash(
data,
keep_pts = c(100, 75, 50, 25),
cut_offs = c(500, 1000, 10000, 1e+05, 5e+05, 1e+06, 5e+06),
num_super_pts = c(50, 75, 150, 500, 750, 1000, 2000, 5000)
)
data |
A data frame (typically from |
keep_pts |
A vector of whole numbers for the number of points to leave unsquashed for each count (N). See the 'Details' section. |
cut_offs |
A vector of whole numbers for the cutoff values of unsquashed data used to determine how many "super points" to end up with after squashing each count (N). See the 'Details' section. |
num_super_pts |
A vector of whole numbers for the number of
"super points" to end up with after squashing each count (N). Length
must be 1 more than length of |
See squashData
for details on squashing a given
count (N).
The elements in keep_pts
determine how many points are left
unsquashed for each count (N). The first element in keep_pts
is used for the smallest N (usually 1). Each successive element is
used for each successive N. Once the last element is reached, it is
used for all other N.
For counts that are squashed, cut_offs
and
num_super_pts
determine how the points are squashed. For instance,
by default, if a given N contains less than 500 points to be
squashed, then those points are squashed to 50 "super points".
A data frame with column names N, E, and weight containing the reduced data set.
DuMouchel W, Pregibon D (2001). "Empirical Bayes Screening for Multi-item Associations." In Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '01, pp. 67-76. ACM, New York, NY, USA. ISBN 1-58113-391-X.
processRaw
for data preparation and
squashData
for squashing individual counts
data.table::setDTthreads(2) #only needed for CRAN checks
data(caers)
proc <- processRaw(caers)
table(proc$N)
squash1 <- autoSquash(proc)
ftable(squash1[, c("N", "weight")])
## Not run: squash2 <- autoSquash(proc, keep_pts = c(50, 5))
## Not run: ftable(squash2[, c("N", "weight")])
## Not run:
squash3 <- autoSquash(proc, keep_pts = 100,
cut_offs = c(250, 500),
num_super_pts = c(20, 60, 125))
## End(Not run)
## Not run: ftable(squash3[, c("N", "weight")])
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.