| hespdiv | R Documentation |
This function is an implementation of spatial data analysis method HespDiv. It performs hierarchical spatial data subdivision by recursively dividing the data using random split-lines, evaluating their comparison values (how well they separate data), and using the best to perform subdivisions.
hespdiv(
data,
xy.dat = NULL,
n.split.pts = 15,
same.n.split = TRUE,
method = "horn.morisita",
generalize.f = NULL,
compare.f = NULL,
maximize = NULL,
N.crit = 1,
N.rel.crit = 0.2,
N.loc.crit = 1,
N.loc.rel.crit = 0.2,
S.crit = 0.05,
S.rel.crit = 0.2,
Q.crit = NULL,
c.splits = TRUE,
c.Q.crit = NULL,
c.crit.improv = 0,
c.X.knots = 5,
c.Y.knots = 10,
c.max.iter.no = +Inf,
c.fast.optim = FALSE,
c.corr.term = 0.05,
study.pol = NULL,
use.chull = TRUE,
tracing = NULL,
pnts.col = 1,
display = FALSE,
pacific.region = FALSE,
.do_recurse = TRUE
)
data |
An R object that contains the data to be analyzed. The required data structure depends on the selected method (e.g., character vector of taxa names). |
xy.dat |
A data.frame containing the coordinates for observations in the
|
n.split.pts |
integer. The number of split-points - 1. These points are
used in creating straight split-lines (see details). The total number of straight
split-lines generated can be obtained by |
same.n.split |
logical. Should the number of split-points
( |
method |
Character. Name or abbreviation of a preset method:
|
generalize.f |
Function. Optional function used in custom methods to
prepare input for the |
compare.f |
function. Only required in custom methods. Employed to
quantify the comparison value of a split-line. For this purpose, the
|
maximize |
logical. Only required in custom methods. Determines whether the split-line comparison value should be maximized or minimized during the optimization process. |
N.crit |
number. Minimum required number of observations that should be present in each polygon obtained by a split-line in order for it to meet this criterion. |
N.rel.crit |
number from 0 to 0.5. Each polygon obtained with a split-line must have at least such proportion of observations to pass this criterion. Equation of the proportion: (Number of observations in 1st/2nd resulting polygon) / (Number of observations in the polygon being subdivided) |
N.loc.crit |
number. Minimum required number of different locations that should be present in each polygon obtained by a split-line in order for it to meet this criterion. |
N.loc.rel.crit |
number from 0 to 0.5. Each polygon obtained with a split-line must have at least such proportion of different locations to pass this criterion. Equation of the proportion: (Number of different locations in 1st/2nd resulting polygon) / (Number of different locations in the polygon being subdivided) |
S.crit |
number from 0 to 1. Each polygon obtained with a split-line must have at least such area proportion to pass this criterion. Equation of the proportion: (Area of 1st/2nd resulting polygon) / (Area of the first polygon). The first polygon is the provided study area or convex hull of observation locations. |
S.rel.crit |
number from 0 to 0.5. Each polygon obtained with a split-line must have at least such area proportion to pass this criterion. Equation of the proportion: (Area of 1st/2nd resulting polygon) / (Area of the polygon being subdivided). |
Q.crit |
number. The threshold for a split-line comparison value to be
considered acceptable for a subdivision. When |
c.splits |
logical. When set to TRUE, the algorithm will explore nonlinear split-lines in addition to straight split-lines in order to find the optimal subdivision. |
c.Q.crit |
number. The threshold for a split-line comparison value to
be considered acceptable for generating nonlinear split-lines. It is
recommended to use the default value, which does not impose a performance
requirement, unless you have a clear understanding of the potential
improvements that nonlinear split-lines can achieve over straight split-lines.
If |
c.crit.improv |
number. The threshold for the improvement in a split-line comparison value required for a nonlinear split-line to be selected instead of a straight split-line for subdivision. The default value of 0 means that even if a nonlinear split-line performs equally to a straight split-line, the straight split-line will still be chosen. |
c.X.knots |
integer. Specifies the number of columns in a network of
spline knots used to generate nonlinear split-lines. These knots are evenly
distributed along the straight split-line. Adjusting the value of
|
c.Y.knots |
integer. specifies the number of rows in a network of
spline knots used to generate nonlinear split-lines. These knots are
distributed regularly along lines orthogonal to the straight
split-line. Adjusting the value of |
c.max.iter.no |
integer. The maximum number of iterations allowed through
the network of spline knots when searching for the optimal shape of a
nonlinear split-line. Setting a higher value, such as |
c.fast.optim |
logical. Determines when spline knots are selected. If
|
c.corr.term |
number from 0.01 to 0.2. A correction term for nonlinear split-lines that intersect the boundary of the polygon. Smaller values (default is 0.05) are recommended, as they determine the extent to which the outlying interval of the generated spline, which crosses the polygon boundary, should be shifted away from the boundary and inside the polygon in a direction orthogonal to the straight split-line. This shift is specified as a proportion of the polygon height where the spline intersects the polygon boundary. |
study.pol |
A data frame with two columns, |
use.chull |
logical. If |
tracing |
Optional character vector of length two controlling diagnostic
tracing output. The first element specifies the tracing level and must be
one of |
pnts.col |
character or numeric. Specifies the color of observations
in a plot. The argument is used when |
display |
logical. Display a simple plot of results at the end of computations? |
pacific.region |
logical (default is FALSE). When set to TRUE, indicates
that the study area is crossed by the 180th meridian, such as being within
the Pacific Ocean. In this case, the coordinates of |
.do_recurse |
Logical. Controls recursion (for internal use only). Default is |
1) Split-point Placement: The function places a predetermined number of
split-points (n.split.pts + 1) along the perimeter of the study area
(study.pol) or the convex hull of observation locations
(xy.dat) if a study area polygon is not provided. These split-points
are evenly spaced, resulting in a distance between points equal to
1/(n.split.pts + 1) fraction (1/16 by default) of a polygon circumference.
2) Straight Split-lines: Straight split-lines are generated by connecting
the split-points. The total number of straight split-lines generated is equal
to the value of (n.split.pts + 1) * n.split.pts / 2 or to
choose(n.split.pts + 1, 2). This holds true only in the first iteration when
same.n.split is set to FALSE or in all iterations when
same.n.split is set to TRUE. Note that the total number of
split-lines generated will not be equal to the number of split-lines
evaluated since split-lines that cross polygon boundary or do not pass
sample size, area or location number subdivision criteria are not evaluated.
3) Subdivisions: Each split-line spatially divides the data and study area polygon into two subsets.
4) Criteria: Both subsets are then checked to see if they meet sample size, area and location number criteria.
5) Obtaining comparison values: Subsets that meet the criteria are
compared using the generalize.f and compare.f functions to
obtain a comparison value. First, each subset is passed to
generalize.f to obtain a generalization value, such as the Pielou
evenness index for the "pielou" method; see the Note section
for clarification. These values are then used by compare.f to compare
the subsets. In this way, each split-line that passed all subdivision criteria
is assigned a comparison value.
6) Best Straight Split-line Selection: The best performing straight
split-line is determined based on whether the maximize argument is
set to TRUE or FALSE. If maximize is TRUE, the
best split-line is the one with the highest comparison value; if
maximize is FALSE, the best split-line has the lowest
comparison value.
The combination of the generalize.f, compare.f, and
maximize arguments can be provided, enabling the creation of custom
methods, or it can be determined internally based on the chosen preset
subdivision methods (see below).
7) Nonlinear Split-lines: If the best straight split-line meets
the quality criteria specified by the c.Q.crit, it serves as the basis
for generating variously shaped curves (nonlinear split-lines). These curves
are produced using splines, which are mathematical functions that can create
smooth and flexible curves.
To generate the curves, a number of knots (control points) for the splines are
distributed evenly along a set of lines orthogonal to the best straight
split-line. These orthogonal lines are also evenly distributed along the
straight split-line itself. The number of knots and lines used can be
adjusted through the parameters c.Y.knots and c.X.knots,
respectively.
The algorithm then iterates through this network of knots, considering different combinations of knots, to produce curves. By varying the selection and arrangement of knots, different shapes of curves are generated.
8) Nonlinear Split-line Evaluation and Selection: Curves are then processed in the same manner as straight split-lines in steps 3 to 6.
9) Final Split-line Selection and Establishment of Subdivision: If the best
curve outperforms the best straight split-line by a margin of
c.crit.improv, the best split-line becomes nonlinear. Otherwise, it
remains straight. The split-line must also satisfy the criteria established
by the Q.crit argument to be used for subdivision.
10) Recursive Iteration: The process described above is iteratively applied, resulting in a collection of selected split-lines with their performance values. These split-lines hierarchically subdivide space and data, forming polygons of various shapes.
Preset methods and their combinations of compare.f,
generalize.f, and maximize:
"sorensen"Functions calculate the Sorensen similarity index
(Sorensen 1948). maximize = FALSE. Values vary from 0 to 1.
"pielou"Functions calculate the mean proportional decrease in
Pielou evenness (Pielou 1966) that occurs after polygon subdivision.
maximize = TRUE. Values vary from 0 to 1.
"morisita"Functions calculate the Morisita overlap index
(Morisita 1959). maximize = FALSE. Values vary from 0 to 1.
"horn.morisita"Functions calculate the Morisita overlap index, as modified by
Horn (1966). maximize = FALSE. Values vary from 0 to 1.
All preset methods currently available are specifically designed for
bioregionalization purposes. These methods require two key inputs: the
coordinates of fossil taxon occurrences (xy.dat) and the names or IDs
of taxa (data). These names or IDs should be structured as
character or numeric vectors, with each element corresponding to a row in the
xy.dat data frame. Each method compares fossil communities on
opposite sides of a split-line, aiming to minimize similarity or
maximize difference. The output yields biogeographical provinces with a
hierarchical structure.
A hespdiv object, which is a list with seven elements:
poly.statsA data frame containing information about polygons established by the selected split-lines. Its columns are:
rank: The rank of a polygon. It corresponds to the rank of the
split-line that produced the polygon and to the polygon's position in the
hierarchical structure of the subdivision.
plot.id: The ID assigned to the polygon. It corresponds to the
order in which the polygon was processed during the hespdiv analysis.
root.id: The ID of the parent polygon whose subdivision
resulted in the current polygon.
n.splits: The number of straight split-lines evaluated in an
attempt to subdivide the polygon. This count excludes split-lines that crossed
the polygon boundary or did not meet area, sample size, or location-number criteria.
n.obs: The number of observations inside the polygon.
mean: The average comparison value of the straight split-lines
used in the attempted subdivision of the polygon. This value reflects the
general spatial heterogeneity of the data.
sd: The standard deviation of the comparison values of the
straight split-lines used in the attempted subdivision. It indicates
the extent of anisotropy or variation in spatial heterogeneity within the polygon.
str.best: The comparison value of the best straight split-line
produced within the polygon.
str.z.score: The z-score of the comparison value of the best
straight split-line within the polygon. It indicates how outstanding the best
straight split-line is relative to other evaluated straight subdivisions.
has.split: Logical. Indicates whether a subdivision was established
in the polygon.
is.curve: Logical. Indicates whether the established subdivision
was obtained using a curve. If has.split is FALSE, this value is NA.
crv.best: The same as str.best, but for nonlinear split-lines.
crv.z.score: The same as str.z.score, but for nonlinear split-lines.
c.improv: The improvement in comparison value achieved by using
nonlinear split-lines instead of straight split-lines.
split.statsA data frame containing information about established split-lines.
Its columns can be interpreted similarly to those in poly.stats,
but from the perspective of split-lines rather than polygons. For example,
rank is the rank of the split-line, not of the polygon.
The performance column contains the comparison value of the split-line
and corresponds to either str.best or crv.best in poly.stats,
depending on the value of is.curve. The same applies to the
z.score column.
split.linesA list of data frames containing coordinates of established split-lines.
The order of split-lines in this list corresponds to their order in
split.stats.
polygons.xyA list of data frames containing coordinates of polygons established by
the split-lines. The order of polygons in this list corresponds to their
order in poly.stats.
poly.objA list of polygon objects, that is, outputs of generalize.f for each polygon.
The order of elements in this list corresponds to the row order of
poly.stats and the polygon order in polygons.xy.
call.infoInformation about the hespdiv call, including the method and arguments used.
str.difsA list containing comparison values of evaluated straight split-lines for
each polygon. The elements of this list correspond to the rows of
poly.stats.
Please note that if you use the method argument, the arguments
generalize.f, compare.f, and maximize are determined
internally and should not be provided. Therefore, you should only assign
values to these arguments when using a custom method, not a predefined one.
Additionally, you can ignore the generalize.f argument even when
applying custom methods. If generalize.f is set to NULL (default), the
data remains unchanged, as generalize.f acts as an identity function.
Hence, generalize.f is only an optional argument that allows to omit
the transformation or generalization step in compare.f function,
simplifying it.
Both generalize.f and compare.f inherit environments from
parent functions: hespdiv() and .spatial_div(). This
allows them to use additional variables from those environments, such as
xy.dat, samp.xy, id1, and id2.
There is a small possibility that a nonlinear split-line may cross a polygon
boundary. If the result contains such a split-line, or if an error related to
this issue occurs, slightly change one of the arguments, such as
n.split.pts, and re-run hespdiv().
Liudas Daumantas
Horn, H. S. (1966). Measurement of" overlap" in comparative ecological studies. The American Naturalist, 100(914), 419-424.
Morisita, M. (1959). Measuring of interspecific association and similarity between assemblages. Mem Fac Sci Kyushu Univ Ser E Biol, 3, 65-80.
Pielou, E. C. (1966). The measurement of diversity in different types of biological collections. Journal of theoretical biology, 13, 131-144.
Sorensen, T. A. (1948). A method of establishing groups of equal amplitude in plant sociology based on similarity of species content and its application to analyses of the vegetation on Danish commons. Biol. Skar., 5, 1-34.
# Simulated data with a known boundary at x = 0.45 to illustrate boundary detection
study.area <- data.frame(
x = c(0, 0.8, 1, 0.6, 0, 0),
y = c(0, 0, 0.4, 1.1, 1.1, 0)
)
set.seed(852)
# Simulate 100 occurrence coordinates
N <- 100
xy.dat_arg <- data.frame(
x = rpois(N, 4) / 10,
y = rpois(N, 4) / 10
)
xy.dat_arg <- xy.dat_arg[order(xy.dat_arg$x), ]
# Simulate a boundary at x = 0.45
n_left <- sum(xy.dat_arg$x < 0.45)
set.seed(1)
common_data <- letters[1:5]
left_data <- sample(
c(common_data, LETTERS[1:10]),
size = n_left,
replace = TRUE,
# common-endemic probability ratio 3:4
prob = c(rep(3, 5), rep(4, 10))
)
right_data <- sample(
c(common_data, LETTERS[11:20]),
size = N - n_left,
replace = TRUE,
# common-endemic probability ratio 3:4
prob = c(rep(3, 5), rep(4, 10))
)
data_arg <- c(left_data, right_data)
# Apply hespdiv
r <- hespdiv(
data = data_arg,
xy.dat = xy.dat_arg,
n.split.pts = 6, # small value used here for illustration
method = "sor", # subdivision minimizing Sorensen-Dice similarity
S.crit = 0.3, # minimum area size is 30% of study area
study.pol = study.area,
use.chull = FALSE
)
plot_hespdiv(r, n.loc = TRUE) + ggplot2::geom_vline(xintercept = 0.45)
# Detected split-line performance
r$split.stats$performance
# Sorensen-Dice similarity across the true simulated boundary
2 * length(intersect(left_data, right_data)) /
(length(unique(left_data)) + length(unique(right_data)))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.