Description Usage Arguments Details Value Author(s) See Also Examples
Generates aggregated (underdispersed) or overdispersed samples of values from any given distance matrix (class matrix).
1 2 | run_sampler(x, n, alpha, n_start = 1, return_start = FALSE,
starting = NULL)
|
x |
|
n |
A positive integer number indicating the sample size. |
alpha |
Number indicating the strength of aggregation (if negative) or overdispersion (if positive). When alpha = 0 sample is random. |
n_start |
Number of initial selected points. Default is one starting point. |
return_start |
if |
starting |
Character vector indicating the starting point. If not provided random starting value(s) is(are) selected. |
Given a distance matrix (x
), run_sampler
resample n
sample units with an attraction (positive) or repulsive (negative)
effect determined by alpha
(α).
The algorithm begins selecting one random starting point i
.
The following sample unit is then selected based on the probability given
by the distance of i
to each remaining units raised to the power of
alpha
(pr(j | i) = d_{i,j} ^ α). The following selections will then use a joint
probability. The first calculated as the average distance d
of the remaining unit j
to the selected ones k
(pr1(j | k) = d_{k,j} ^ α).
The second as the minimum distance dmin
of the remaining units to the selected ones
(pr2(j | k) = dmin_{k,j} ^ α).
The second probability guarantees that representativeness is achieved.
The procedure is repeated until the selected points reach n
. Positive values of
alpha
generate overdispersed sample designs, as sample units distant from
the selected unit(s) will have a higher probability of being selected. Inversely,
negative values will generate an aggregated design. Note that as alpha
approximate the infinity (+ or -), the sample design becomes more deterministic.
The function returns a vector indicating the selected rows. If return_start is TRUE, a list is returned with the first element being the Sampling_selection - selected sampling units - and Starting_points - selected starting point(s).
Bruno Vilela
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 | # Phylogeny example:
## Generate a random tree
require(ape)
tree <- rcoal(10)
## Calculate the distance
dist <- cophenetic(tree)
## Highly overdispersed 50% resample design (alpha = 50)
selection <- run_sampler(x = dist, n = 5, alpha = 50, starting = "t10")
## Highly aggregated 50% resample design (alpha = -50)
selection2 <- run_sampler(x = dist, n = 5, alpha = -50, starting = "t10")
## Random 50% resample design (alpha = 0)
selection3 <- run_sampler(x = dist, n = 5, alpha = 0, starting = "t10")
## Plot to compare
par(mfrow = c(1, 3))
plot(tree,tip.color=ifelse(tree$tip.label %in% selection, "red","black"),
main = "Overdispersed 50% sampling (red were selected)", cex = 1)
axis(1)
plot(tree,tip.color=ifelse(tree$tip.label %in% selection2, "blue","black"),
main = "Aggregated 50% sampling (blue were selected)", cex = 1)
axis(1)
plot(tree,tip.color=ifelse(tree$tip.label %in% selection3, "green","black"),
main = "Random 50% sampling (green were selected)", cex = 1)
axis(1)
# Geography example
require(sp)
require(maptools)
require(fields)
data(wrld_simpl) # World map
Brazil <- wrld_simpl[wrld_simpl$NAME == "Brazil", ] # Brazil (polygon)
coords <- slot(spsample(Brazil, 100, "regular"), "coords")
rownames(coords) <- paste0("t", 1:nrow(coords))
## Calculate the geographic distance
dist.geo <- rdist.earth(coords)
## Subsample 50%
### Overdispersed
selection.geo <- run_sampler(x = dist.geo, n = 25, alpha = 100, starting = "t10")
### Aggregated
selection.geo2 <- run_sampler(x = dist.geo, n = 25, alpha = -100, starting = "t10")
### Random
selection.geo3 <- run_sampler(x = dist.geo, n = 25, alpha = 0, starting = "t10")
## Plot
par(mfrow = c(1, 3), mar = c(1, 1, 15, 1))
plot(Brazil, main = "Overdispersed 50% sampling (red were selected)")
points(coords, cex = 2, pch = 19,
col = ifelse(rownames(coords) %in% selection.geo, "red","gray"))
plot(Brazil, main = "Aggregated 50% sampling (blue were selected)")
points(coords, cex = 2, pch = 19,
col = ifelse(rownames(coords) %in% selection.geo2, "blue","gray"))
plot(Brazil, main = "Random 50% sampling (green were selected)")
points(coords, cex = 2, pch = 19,
col = ifelse(rownames(coords) %in% selection.geo3, "green","gray"))
# Trait example
## Fake body size
set.seed <- 1
body_size <- runif(1000)
# Biased sample towards large species
set.seed <- 1
body_size_bias <- sample(body_size, 500, prob = body_size)
par(mfrow = c(1, 3))
hist(body_size, main = "Species body size distribution\n(n = 1000)", xlab = "Body size")
hist(body_size_bias, main = "Biased samplig towards larger species\n(n = 500)",
xlab = "Body size")
# Use sampler to reduce the bias
dist_bs <- as.matrix(dist(body_size_bias))
rownames(dist_bs) <- colnames(dist_bs) <- 1:length(body_size_bias)
selection.bs <- run_sampler(x = dist_bs, n = 100, alpha = 100)
hist(body_size_bias[as.numeric(selection.bs)],
main = "Overdispersed sampling of biased information \n(n = 100)",
xlab = "Body size")
# Real time simulation
require(raster)
par(mfrow = c(1, 1))
r <- raster(res = 25)
values(r) <- runif(ncell(r))
plot(r)
coords <- xyFromCell(r, 1:ncell(r))
rownames(coords) <- 1:ncell(r)
dist.geo <- as.matrix(dist(coords))
startingI <- c(1)
# Change alpha and see how it works
for(i in (length(startingI)+1):30) {
selection.geo <- run_sampler(x = dist.geo, n = i, alpha = 100,
starting = startingI)
startingI <- as.numeric(selection.geo)
r2 <- r
values(r2)[as.numeric(selection.geo)] <- 1
values(r2)[-as.numeric(selection.geo)] <- NA
plot(r2, col = "gray")
Sys.sleep(time = .2)
}
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.