Description Usage Arguments Details Value Warning See Also Examples
Like pick
, but allows specifying multiple factors (columns) at the same time, trying hard to return the desired result. You want 2 species from the same 3 strata during the same 4 years? Use mpick
. Want just one of those? Use pick
.
1 |
X |
A data.table |
p |
A named vector of integers. Names are columns in |
weight |
Logical, default FALSE. Same as |
limit |
Time limit for searching, in seconds |
screen |
Logical If TRUE (default) then before random searching, will screen out factor levels that definitely cannot satisfy the full sweet of conditions in |
dt |
Logical, if TRUE, returns a data.table; if FALSE (default), returns an index of that data.table? |
This problem may ultimately be better suited for a real optimization algorithm. Right now, relies and arbitrary guess-and-check. Does not "forget" failed guesses (only specific combinations are worth forgetting, and for large data sets there's a very low probability of happening upon same combination). Thus, this is a very brute-force approach, with the exception of the checking done when screen=TRUE
.
It is highly recommended that limit
be set to allow for a couple minutes of searching. Of course, this depends on the size of X
and the details of p
.
screen
is very effective when many possible factor levels in p
can be ruled out based on their overall scarcity. Consider the example of 2 spp, 3 stratum, 4 year. If a given level of spp does not occur at least 3*4=12 times in the data set, it can be ruled out. Because very rare species comprise the majority of unique spp in trawl data, this screening can be outstandingly effective.
Be aware that it is easy to accidentally ask a lot of this function, and don't be surprised when it doesn't give you an answer quickly, or at all. For example, asking for 10 spp 5 stratum 5 year might seem meager for a data set observed over 30 years for 100 strata and 800 spp. However, this is a big ask: 10 species found together in the same 5 places in each of 5 years. If the average stratum has about 30 species, you're requesting that a 3rd of the local biodiversity constitute the same species 25 separate times. If a stratum is small or if species are cosmopolitan, you might get a good result; but that'd be lucky.
A data.table that is a subset of X
.
This function is still experimental. See http://stackoverflow.com/q/33714985/2343633 for possible updates (but this was not a popular question).
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 | # simple and fast example
set.seed(1337)
mpick(clean.ebs, p=c(spp=2, year=1), weight=TRUE, screen=TRUE, dt=TRUE)
# More complex example
# if we want 5 spp that are
# found in the same 5 strat in
# at least 1 year; but then
# we want to allow for +/- 2 years
# on either side of that shared year
# First we get the 5-5-1 subset index,
# Then we search for those chosen spp-stratum-year,
# but then we also search for the additional years
## Not run:
set.seed(1337)
ind <- mpick(clean.ebs, p=c(spp=5, stratum=5, year=1), weight=TRUE, limit=60)
logic <- expression(
spp%in%spp[ind]
& stratum%in%stratum[ind]
& as.integer(year)%in%(as.integer(unique(year[ind])) + (-2:2))
)
clean.ebs[eval(logic)]
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.