User Guide
In spqdep: Testing for Spatial Independence of Cross-Sectional Qualitative Data

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

knitr::opts_chunk$set(echo = TRUE)

library(spdep)
library(spatialreg)
library(sf)
library(ggplot2)

Introduction

This guide show the functionalities of the spqdep package to test spatial dependence on qualitative datasets.

Global and local statistics give information about the spatial structure of the spatial dataset.

Datasets

Three data sets will be used as examples in this guide:

provinces_spain: The division of Spain into provinces. It is a multypolygon geometry with isolated provinces (Canary and Balearic islands without neighbouring provinces). See for example @paezSpatioTemporalAnalysisEnvironmental2021.
FastFood.sf: A simple feature (sf) dataframe containing the locations of a selection of fast food restaurants in the city of Toronto, Canada (data are from 2008). The data set used as example in @ruiz2010. It is a geometry of points.
Boots.sf: A simple features object. A square regular lattice 16x16 from Fig. 3.3 in @upton1985. In this figure, the cells coded black/white correspond to quadrats where the perennial shrub Atriplex hymenelytra is present/absent in a sample area in Death Valley, California.

The package is install like usual and the dataset can be loaded using the next code

library(spqdep)
data("provinces_spain", package = "spqdep")
data("FastFood.sf", package = "spqdep")
data("Boots.sf", package = "spqdep")

Data Generating Process

Additional to the datasets available in the spqdep package. The user can generate structured spatial processes using the \code{dgp.spq()} function. The DGP generate with this function defined in @ruiz2010.

The next code show how to generate a random process on a set of random points localized in a square $1 \times 1$. In this case, the connectivity criteria is based on the 4 nearest neighbourhood.

set.seed(123)
N <- 100
cx <- runif(N)
cy <- runif(N)
coor <- cbind(cx,cy)
p <- c(1/6,3/6,2/6) # proportion of classes
rho = 0.5 # level of spatial structure
listw <- spdep::nb2listw(spdep::knn2nb(spdep::knearneigh(coor, k = 4)))
fx <- dgp.spq(list = listw, p = p, rho = rho)

The next plot show the qualitative spatial process defined.

ggplot(data.frame(fx = fx, cx = cx, cy = cy), ggplot2::aes(x = cx, y = cy, color = fx)) + 
    geom_point(size = 6) +
    theme_bw()

Q-test

The Q-test is a simple, consistent, and powerful statistic for qualitative spatial independence that we develop using concepts from symbolic dynamics and symbolic entropy. The Q test can be used to detect, given a spatial distribution of events, patterns of spatial association of qualitative variables in a wide variety of settings.

The Q-test [@ruiz2010] is based on m-surroundings
Before to apply the Q-test it is necessary define a set of the m-surroundings
The \code{m.surround()} function generate a set of m-surrounding.
The user can tuning several parameters to obtain a congruent set of m-surroundings.

The Q(m) statistic was introduced by @ruiz2010 as a tool to explore geographical co-location/co-occurrence of qualitative data. Consider a spatial variable X which is the result of a qualitative process with a set number of categorical outcomes $a_j$ (j=1,...,k). The spatial variable is observed at a set of fixed locations indexed by their coordinates $s_i$ (i=1,..., N), so that at each location si where an event is observed, $X_i$ takes one of the possible values $a_j$.

Since the observations are georeferenced, a spatial embedding protocol can be devised to assess the spatial property of co-location. Let us define, for an observation at a specified location, say $s_0$, a surrounding of size m, called an m-surrounding. The m-surrounding is the set of m-1 nearest neighbours from the perspective of location $s_0$. In the case of distance ties, a secondary criterion can be invoked based on direction.

Once that an embedding protocol is adopted and the elements of the m-surrounding for location $s_0$ have been determined, a string can be obtained that collects the elements of the local neighborhood (the m-1 nearest neighbors) of the observation at $s_0$. The m-surrounding can then be represented in the following way:

$$X_m(s_0)=(X_{s_0},X_{s_1},...X_{s_{m-1}})$$

Since each observation Xs takes one of k possible values, and there are m observations in the m-surrounding, there are exactly k possible unique ways in which those values can co-locate. This is the number of permutations with replacement. For instance, if k=2 (e.g. the possible outcomes are a1=0 and a2=1) and m=3, the following eight unique patterns of co-location are possible (the number of symbols is $n_{\sigma}$=8): {0,0,0}, {1,0,0}, {0,1,0}, {0,0,1}, {1,1,0}, {1,0,1}, {0,1,1}, and {1,1,1}. Each unique co-locationtype can be denoted in a convenient way by means of a symbol $\sigma_i$ $(i=1, 2,...,k^m)$. It follows that each site can be uniquely associated with a specific symbol, in a process termed symbolization. In this way, we say that a location s is of type $\sigma_i$ if and only if $X_m(s)=\sigma_i$. Equivalent symbols (see Páez, et al. 2012) can be obtained by counting the number of occurrences of each category within an m-surrounding. This surrenders some topological information (ordering within the m-surrounding is lost) in favor of a more compact set of symbols, since the number of combinations with replacement.

Definition of Q(m) statistic

Let ${X_s}_{s \in R}$ be a discrete spatial process and m be a fixed embedding dimension. The statistic Q(m) testing the null hypothesis:

$H_0:{X_s}_{s \in R}$ is spatially independent, against any other alternative.

For a fixed $m \geq 2$, the relative frequency of symbols can be used to define the symbolic entropy of the spatial process as the Shanon entropy of the distinct symbols:

$$h(m) = - \sum_j p_{\sigma_j}ln(p_{\sigma_j})$$

where

$$p_{\sigma_j}={ n_{\sigma_j} \over R}$$

with $n_{\sigma_j}$ is simply the number of times that the symbol $\sigma_j$ is observed and R the number of symbolized locations.

The entropy function is bounded between $0 < h (m) \leq \eta$.

The Q statistic is essentially a likelihood ratio test between the symbolic entropy of the observed pattern and the entropy of the system under the null hypothesis of a random spatial sequence:

$$Q(m)=2R(\eta-h(m))$$

with $\eta = ln(k^m)$. The statistic is asymptotically $\chi^2$ distributed with degrees of freedom equal to the number of symbols minus one.\cr

m-surroundings

m.surround() is the function to generate m-surroundings.
The output of this function is a object of the class m_surr
Using the \code{plot()} method the user can explore the coherence of m-surroundings because some times the algorithm can generate degenerate m-surrounding with so far observations with the degree of overlapping is low. The recommendation is use the control options.

The overlapping degree

The performance of the statistic can become compromised due to the overlap of m-surroundings. To meet all key approximations for testing, the overlap is controlled by letting the maximum number of symbolized locations S be less than the actual number of observations N,

When r = 0 (i.e., no overlap is allowed) we exactly have binomial random variables. Note that the maximum number of locations that can be symbolized with an overlapping degree r is

$$R = { [{{|S|−m} \over {m-r}} ]} + 1$$,

where [x] denotes the integer part of a real number x, and |S| the cardinality of the set S. Therefore, reducing the degree of overlap also implies that the number of symbolized locations will be smaller than the number of observations in the sample

By example, the next code obtain m-surroundings with length m = 3 and degree of overlapping r = 1:

m = 3
r = 1
mh <- m.surround(x = cbind(cx,cy), m = m, r = r)
class(mh)

Methods for the m_surr class

The spqdep have three methods that can be apply to this class: \code{print()}, \code{summary} and \code{plot}

The print method

\code{print()} list the m-surroundings

print(mh)

The summary method

\code{summary()} generate a summary of some characteristics of the m-surroundings

summary(mh)

The plot method

\code{plot()} show the spatial structure of the m-surroundings

plot(mh, type = 1)

With the argument control the user can tuning some characteristics of the m-surroundings.

By example, with control argument, the user can 'prune' non-coherent m-surroundings.

control <- list (dtmaxknn = 10)
mh.prune <- m.surround(x = coor, m = m, r = r, control = control)
plot(mh.prune)

The Q-test

The function \code{Q.test()} obtain the Q-test for a spatial process develop in @ruiz2010.

The user must select the longitude of the m-surroundings (m) and the overlapping degree (r). In the next code example, the Q-test is obtain for the DGP spatial process (fx) obtain with the \code{dgp.spq()}. The coordinates coor must be included as argument.

q.test <- Q.test(fx = fx, coor = coor, m = 3, r = 1)

The output is a list with the result for symbols based on permutations (standard) and combinations (equivalent).
The output of this function is an object of the spqtest class.

Distribution of Q-test

The asymptotic distribution is the default distribution to obtain the significance of Q-test [@ruiz2010].
Alternatively, the Monte Carlo method can be used to obtain the significance of the test. The paper @lopez2012 describe this approach.

q.test.mc <- Q.test(fx = fx, coor = coor, m = 3, r = 1, distr = "mc")
summary(q.test.mc)

Methods for the spqtest class

A summary can be apply to an object of the spqtest class:

summary(q.test)

The histogram of the number of symbols is obtain appling the plot method.

plot(q.test)

The Q-test using a sf object

A sf object [@pebesma2018] or a data frame can be used as input of the \code{Q.test()} function:

sf_use_s2(FALSE)
provinces_spain$GenderRatio <- factor(provinces_spain$Mal2Fml > 100)
levels(provinces_spain$GenderRatio) = c("men","woman")
f1 <- ~ provinces_spain$GenderRatio
q.test.sf <- Q.test(formula = f1, data = provinces_spain, m = 3, r = 1)

The method \code{plot()} show the histogram of the number of symbols

plot(q.test.sf)

QMap-test

The function \code{QMap()} obtain the test for maps comparison publish in @Ruiz2012b

The next code generate two qualitative spatial process with different levels of spatial dependence and the Q-Map is apply.

p <- c(1/6,3/6,2/6)
rho = 0.5
QY1 <- dgp.spq(p = p, listw = listw, rho = rho)
rho = 0.8
QY2 <- dgp.spq(p = p, listw = listw, rho = rho)
dt = data.frame(QY1,QY2)
m = 3
r = 1
formula <- ~ QY1 + QY2
control <- list(dtmaxknn = 10)
qmap <- Q.map.test(formula = formula, data = dt, coor = coor, m = m, r = r, type ="combinations", control = control)

The output of \code{Q.Map()} id an object of the classes qmap and htest

Methods for qmap class

The qmap object is a list with two elements. Each element is an object of the class htext

print(qmap[[1]])

The \code{plot()} method obtains the distribution of symbols with the confidence intervals specified by the user.

plot(qmap, ci=.6)

Runs test

The spatial runs test compute the global and local test for spatial independence of a categorical spatial data set [@ruiz2021].

Definition of spatial run:

In this section define the concepts of spatial encoding and runs, and construct the main statistics necessary for testing spatial homogeneity of categorical variables. In order to develop a general theoretical setting, let us consider ${X_s}_{s \in S}$ to be the categorical spatial process of interest with Q different categories, where S is a set of coordinates.

Spatial encoding:

For a location $s \in S$ denote by $N_s = {s_1,s_2 ...,s_{n_s}}$ the set of neighbours according to the interaction scheme W, which are ordered from lesser to higher Euclidean distance with respect to location s.

The sequence as $X_{s_i} , X_{s_i+1},...,, X_{s_i+r}$ its elements have the same value (or are identified by the same class) is called a spatial run at location s of length r.

The spatial run statistic:

The total number of runs is defined as: $$SR^Q=n+\sum_{s \in S}\sum_{j=1}^{n_s}I_j^s$$ where $I_j^s = 1 \ if \ X_{s_j-1} \neq X_{s_j} \ \text{and 0 otherwise}$ for $j=1,2,...,n_s$ Following result by the Central Limit Theorem, the asymtotical distribution of $SR^Q$ is:

$$SR^Q = N(\mu_{SR^Q},\sigma_{SR^Q})$$

In the one-tailed case, we must distinguish the lower-tailed test and the upper-tailed, which are associated with homogeneity and heterogeneity respectively. In the case of the lower-tailed test, the following hypotheses are used:

$H_0:{X_s}_{s \in S}$ is i.i.d.

$H_1$: The spatial distribution of the values of the categorical variable is more homogeneous than under the null hypothesis (according to the fixed association scheme).

In the upper-tailed test, the following hypotheses are used:

$H_0:{X_s}_{s \in S}$ is i.i.d.

$H_1$: The spatial distribution of the values of the categorical variable is more heterogeneous than under the null hypothesis (according to the fixed association scheme).

These hypotheses provide a decision rule regarding the degree of homogeneity in the spatial distribution of the values of the spatial categorical random variable.

Global Runs test

The function sp.runs.test obtain the spatial runs test.

listw <- knearneigh(coor, k = 3)
srq <- sp.runs.test(fx = fx, listw = listw)

The output of this function is a object of the classes sprunstest and htest

Methods for spruntest class

The spqdep has two methods for this class \code{print} y \code{plot}

print(srq)

plot(srq)

The local Runs test

The function local.sp.runs.test obtain the local tests based on runs.

Asymptotic version

Asymptotic version

lsrq <- local.sp.runs.test(fx = fx, listw = listw, alternative = "less")

The \code{print()} method list the statistic of each observation (points or regions)

print(lsrq)

The \code{plot()} method identify the localization with values of local test significant.

plot(lsrq, sig = 0.05)

Monte Carlo local runs test

The Monte Carlo distribution ot the local test using a sf object

data("provinces_spain")
listw <- spdep::poly2nb(as(provinces_spain,"Spatial"), queen = FALSE)
provinces_spain$Mal2Fml <- factor(provinces_spain$Mal2Fml > 100)
levels(provinces_spain$Mal2Fml) = c("men","woman")
plot(provinces_spain["Mal2Fml"])
formula <- ~ Mal2Fml
# Boots Version
lsrq <- local.sp.runs.test(formula = formula, data = provinces_spain, listw = listw, distr ="bootstrap", nsim = 199)
plot(lsrq, sf = provinces_spain, sig = 0.10)

The Scan test

Two of the scan tests to identify clusters can be apply to test spatial structure in qualitative spatial processes.
The scan test don't need pre-define the classical W conectivity matrix.
See @Kanaroglou2016
The scan tests contrasts the null of independence of a spatial qualitative process and give additional information indicating one (or perhaps more) spatial cluster(s).
The scan tests don't have asymptotic distribution. The significance is obtained by permutational resampling.
The output of the scan function is an object of the classes scantest and htest

Scan bernoulli

For qualitative spatial process with two categories the bernoulli scan test is obtain with the next code

formula <- ~ Mal2Fml
scan.spain <- spqdep::scan.test(formula = formula, data = provinces_spain, 
                                case="men", nsim = 99, distr = "bernoulli")
print(scan.spain)

Scan test with flexible windows

The flexible windows is an option. Note that is slow. Consider maximum nv of 12.

listw <- spdep::poly2nb(provinces_spain, queen = FALSE)
scan.spain <- spqdep::scan.test(formula = formula, data = provinces_spain, 
                                case="men", nsim = 99, windows = "flexible", 
                                listw = listw, nv = 6, distr = "bernoulli")
print(scan.spain)

Scan multinomial

In case of a spatial process with three or more categories

data(FastFood.sf)
formula <- ~ Type
scan.fastfood <- scan.test(formula = formula, data = FastFood.sf, nsim = 99, distr = "multinomial", windows = "elliptic", 
                           nv = 50)
print(scan.fastfood)

Methods for scan test

Two method can be used with scantest objects: \code{summary()} and \code{plot()}

summary(scan.fastfood)

plot(scan.spain, sf = provinces_spain)

data(FastFood.sf)
# plot(scan.fastfood, sf = FastFood.sf)

Similarity test

@farber_testing_2015 develop the similarity test.

The \code{similarity.test()} function calculates the similarity test for both asymptotic distribution and permutational resampling.

coor <- st_coordinates(st_centroid(FastFood.sf))
listw <- spdep::knearneigh(coor, k = 4)
formula <- ~ Type
similarity <- similarity.test(formula = formula, data = FastFood.sf, listw = listw)
print(similarity)

Join-count tests

The functions of the spdep R-package have been wrapped for Bernoulli and Multinomial distributions. Asymptotic or Monte Carlo distributions (permutations) can be used to evaluate the signification of the tests.

Asymptotic distribution

provinces_spain$Older <- cut(provinces_spain$Older, breaks = c(-Inf,19,22.5,Inf))
levels(provinces_spain$Older) = c("low","middle","high")
f1 <- ~ Older + Mal2Fml
jc1 <- jc.test(formula = f1, data = provinces_spain, distr = "asymptotic", alternative = "greater", zero.policy = TRUE)
summary(jc1)

Monte Carlo distribution

jc1 <- jc.test(formula = f1, data = provinces_spain, distr = "mc", alternative = "greater", zero.policy = TRUE)
summary(jc1)

References

Any scripts or data that you put into this service are public.

spqdep documentation built on April 12, 2025, 1:50 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

spqdep
Testing for Spatial Independence of Cross-Sectional Qualitative Data

User Guide
In spqdep: Testing for Spatial Independence of Cross-Sectional Qualitative Data

Introduction

Datasets

Data Generating Process

Q-test

m-surroundings

Methods for the m_surr class

The print method

The summary method

The plot method

The Q-test

Distribution of Q-test

Methods for the spqtest class

The Q-test using a sf object

QMap-test

Methods for qmap class

Runs test

Global Runs test

Methods for spruntest class

The local Runs test

Asymptotic version

Monte Carlo local runs test

The Scan test

Scan bernoulli

Scan test with flexible windows

Scan multinomial

Methods for scan test

Similarity test

Join-count tests

Asymptotic distribution

Monte Carlo distribution

References

Try the spqdep package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

spqdep Testing for Spatial Independence of Cross-Sectional Qualitative Data

User Guide In spqdep: Testing for Spatial Independence of Cross-Sectional Qualitative Data

Introduction

Datasets

Data Generating Process

Q-test

m-surroundings

Methods for the m_surr class

The print method

The summary method

The plot method

The Q-test

Distribution of Q-test

Methods for the spqtest class

The Q-test using a sf object

QMap-test

Methods for qmap class

Runs test

Global Runs test

Methods for spruntest class

The local Runs test

Asymptotic version

Monte Carlo local runs test

The Scan test

Scan bernoulli

Scan test with flexible windows

Scan multinomial

Methods for scan test

Similarity test

Join-count tests

Asymptotic distribution

Monte Carlo distribution

References

Try the spqdep package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

spqdep
Testing for Spatial Independence of Cross-Sectional Qualitative Data

User Guide
In spqdep: Testing for Spatial Independence of Cross-Sectional Qualitative Data