Description Usage Arguments Details Value Author(s) See Also Examples
Generates aggregated (underdispersed) or overdispersed samples of values from any given distance matrix (class matrix).
1 2 | run_sampler(x, n, alpha, n_start = 1, return_start = FALSE,
starting = NULL)
|
x |
|
n |
A positive integer number indicating the sample size. |
alpha |
Number indicating the strength of aggregation (if negative) or overdispersion (if positive). When alpha = 0 sample is random. |
n_start |
Number of initial selected points. Default is one starting point. |
return_start |
if |
starting |
Character vector indicating the starting point. If not provided random starting value(s) is(are) selected. |
Given a distance matrix (x), run_sampler resample n
sample units with an attraction (positive) or repulsive (negative)
effect determined by alpha(α).
The algorithm begins selecting one random starting point i.
The following sample unit is then selected based on the probability given
by the distance of i to each remaining units raised to the power of
alpha (pr(j | i) = d_{i,j} ^ α). The following selections will then use a joint
probability. The first calculated as the average distance d of the remaining unit j
to the selected ones k (pr1(j | k) = d_{k,j} ^ α).
The second as the minimum distance dmin of the remaining units to the selected ones
(pr2(j | k) = dmin_{k,j} ^ α).
The second probability guarantees that representativeness is achieved.
The procedure is repeated until the selected points reach n. Positive values of
alpha generate overdispersed sample designs, as sample units distant from
the selected unit(s) will have a higher probability of being selected. Inversely,
negative values will generate an aggregated design. Note that as alpha
approximate the infinity (+ or -), the sample design becomes more deterministic.
The function returns a vector indicating the selected rows. If return_start is TRUE, a list is returned with the first element being the Sampling_selection - selected sampling units - and Starting_points - selected starting point(s).
Bruno Vilela
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 | # Phylogeny example:
## Generate a random tree
require(ape)
tree <- rcoal(10)
## Calculate the distance
dist <- cophenetic(tree)
## Highly overdispersed 50% resample design (alpha = 50)
selection <- run_sampler(x = dist, n = 5, alpha = 50, starting = "t10")
## Highly aggregated 50% resample design (alpha = -50)
selection2 <- run_sampler(x = dist, n = 5, alpha = -50, starting = "t10")
## Random 50% resample design (alpha = 0)
selection3 <- run_sampler(x = dist, n = 5, alpha = 0, starting = "t10")
## Plot to compare
par(mfrow = c(1, 3))
plot(tree,tip.color=ifelse(tree$tip.label %in% selection, "red","black"),
main = "Overdispersed 50% sampling (red were selected)", cex = 1)
axis(1)
plot(tree,tip.color=ifelse(tree$tip.label %in% selection2, "blue","black"),
main = "Aggregated 50% sampling (blue were selected)", cex = 1)
axis(1)
plot(tree,tip.color=ifelse(tree$tip.label %in% selection3, "green","black"),
main = "Random 50% sampling (green were selected)", cex = 1)
axis(1)
# Geography example
require(sp)
require(maptools)
require(fields)
data(wrld_simpl) # World map
Brazil <- wrld_simpl[wrld_simpl$NAME == "Brazil", ] # Brazil (polygon)
coords <- slot(spsample(Brazil, 100, "regular"), "coords")
rownames(coords) <- paste0("t", 1:nrow(coords))
## Calculate the geographic distance
dist.geo <- rdist.earth(coords)
## Subsample 50%
### Overdispersed
selection.geo <- run_sampler(x = dist.geo, n = 25, alpha = 100, starting = "t10")
### Aggregated
selection.geo2 <- run_sampler(x = dist.geo, n = 25, alpha = -100, starting = "t10")
### Random
selection.geo3 <- run_sampler(x = dist.geo, n = 25, alpha = 0, starting = "t10")
## Plot
par(mfrow = c(1, 3), mar = c(1, 1, 15, 1))
plot(Brazil, main = "Overdispersed 50% sampling (red were selected)")
points(coords, cex = 2, pch = 19,
col = ifelse(rownames(coords) %in% selection.geo, "red","gray"))
plot(Brazil, main = "Aggregated 50% sampling (blue were selected)")
points(coords, cex = 2, pch = 19,
col = ifelse(rownames(coords) %in% selection.geo2, "blue","gray"))
plot(Brazil, main = "Random 50% sampling (green were selected)")
points(coords, cex = 2, pch = 19,
col = ifelse(rownames(coords) %in% selection.geo3, "green","gray"))
# Trait example
## Fake body size
set.seed <- 1
body_size <- runif(1000)
# Biased sample towards large species
set.seed <- 1
body_size_bias <- sample(body_size, 500, prob = body_size)
par(mfrow = c(1, 3))
hist(body_size, main = "Species body size distribution\n(n = 1000)", xlab = "Body size")
hist(body_size_bias, main = "Biased samplig towards larger species\n(n = 500)",
xlab = "Body size")
# Use sampler to reduce the bias
dist_bs <- as.matrix(dist(body_size_bias))
rownames(dist_bs) <- colnames(dist_bs) <- 1:length(body_size_bias)
selection.bs <- run_sampler(x = dist_bs, n = 100, alpha = 100)
hist(body_size_bias[as.numeric(selection.bs)],
main = "Overdispersed sampling of biased information \n(n = 100)",
xlab = "Body size")
# Real time simulation
require(raster)
par(mfrow = c(1, 1))
r <- raster(res = 25)
values(r) <- runif(ncell(r))
plot(r)
coords <- xyFromCell(r, 1:ncell(r))
rownames(coords) <- 1:ncell(r)
dist.geo <- as.matrix(dist(coords))
startingI <- c(1)
# Change alpha and see how it works
for(i in (length(startingI)+1):30) {
selection.geo <- run_sampler(x = dist.geo, n = i, alpha = 100,
starting = startingI)
startingI <- as.numeric(selection.geo)
r2 <- r
values(r2)[as.numeric(selection.geo)] <- 1
values(r2)[-as.numeric(selection.geo)] <- NA
plot(r2, col = "gray")
Sys.sleep(time = .2)
}
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.