epgwr_mc: Error propagation in geographically weighted regression using...

Description Usage Arguments Value Note Author(s) References See Also Examples

View source: R/simulation.R

Description

This function applies error propagation in a given geographically weighted regression model using Monte Carlo simulation

Usage

1
2
3
4
5
6
epgwr_mc(data_in, y_name, x_names, coord_names = c("X", "Y"), sd_y, sd_x_vec,
  sd_coord, y_min = FALSE, y_max = FALSE, x_min = FALSE, x_max = FALSE,
  n_sim, multicore = 4, adapt_in = FALSE, bw_in = FALSE,
  gweight_in = gwr.bisquare, rho_in = FALSE, neigh_dist = FALSE,
  normalize = TRUE, skip_F123 = FALSE, seed = 42,
  print_progress = FALSE)

Arguments

data_in

data for the GWR model as a (data.frame) table, a SpatialPointsDataFrame or a SpatialPolygonsDataFrame object with columns as metric coordinates; for Spatial*DataFrame objects, only the data table is used, everything else is ignored

y_name

name of the independent variable; for example y_name="PctBach"

x_names

vector of names of the depedent variables; needs to be the same size as sd_x_vec; for example x_names=c("TotPop90", "PctRural", "PctEld", "PctFB", "PctPov", "PctBlack")

coord_names

vector of X and Y coordinate names in the data_in table (this is also required for the Spatial*DataFrame data table!); defaults are "X" and "Y"; this is case-sensitive!; for example coord_names=c("X", "Y")

sd_y

standard deviation of dependent variable; can also be a vector of different deviations for different points; for example sd_y=0.5 or sd_y=c(0.5, 0.7, 0.5, 0.2, 0.5, 0.8, 0.2, 0.5, 0.4, 0.9)

sd_x_vec

vector of standard deviations of independent variables; can also be a list of vectors of different deviations for different points; for example sd_x_vec=c(500, 8, 1, 0.05, 1.5, 2) or sd_x_vec=list(500, 8, 1, 0.05, c(0.5, 0.7, 0.5, 0.2, 0.5, 0.8, 0.2, 0.5, 0.4, 0.9), 2)

sd_coord

standard deviation of coordinates; can also be a vector of different deviations for different points; for example sd_coord=5000 or sd_coord=c(5000, 7000, 5000, 2000, 5000, 8000, 2000, 5000, 4000, 9000)

y_min

minimum value for the dependent variable; use default FALSE if no such values are valid

y_max

maximum value for the dependent variable; use default FALSE if no such values are valid

x_min

vector of minimum values for independent variables; use default FALSE if no such values are valid; for example x_min=C(0,0,0,0,0,0)

x_max

vector of maximum values for independent variables; use default FALSE if no such values are valid; for example x_max=c(999999,1,1,1,1,1)

n_sim

number of simulations

multicore

number of cores (or actually processes that are run on different cores; default is 4) used in calculation; set to FALSE (or 1) to use just one

adapt_in

TRUE for adaptive or FALSE (default) for fixed bandwidth

bw_in

FALSE (default) for letting the software calibrate the bandwidth, or a set value to skip bandwidth calibration and use a constant; NOTE: if using adaptive bandwidth, give a value between ]0,1] as the percentage of points used

gweight_in

geographical weighting function gwr.Gauss, gwr.gauss or gwr.bisquare (default)

rho_in

rho value for autoregressive random values (more than 0 and less than 1) or FALSE (default) if spatial autocorrelation is not used; NOTE: high rho values will distort the distribution (mean values will increase/decrease and sd will increase if neigh_dist is small) of error values for each realization; use normalize=TRUE to fix this problem

neigh_dist

distance treshold for autoregressive random values, d_max; increasing this will decrease level of spatial autocorrelation of error values; minimum value for this is the largest nearest neighbor distance after the realizations of errors have been applied to coordinates, so use a small distance for which around each point there are a couple of points

normalize

TRUE (default) for normalizing the autocorrelated errors to match the same mean and sd as the non-correlated errors; heavily recommended

skip_F123

FALSE (default) for calculating Leung's F123 statistics; TRUE for skipping them as they can take quite a long time to calculate

seed

seed for the RNG simulations; default 42; NULL for no seed (hasn't been tested)

print_progress

TRUE for printing a message each time a simulation is done, FALSE (default) for skipping it; if multicore=FALSE, printing happens via the normal R console, otherwise the user is asked a directory where a text file will be stored and the messages are printed there; this is because normal printing is not possible with parallel computing in Windows

Value

A list including

simul_metrics

a matrix for single-value metrics

point_metrics

a matrix for point metrics

original_simul_metrics

a vector for original values for single-value metrics

original_point_metrics

a vector for original values for point metrics

info_simul_metrics

a vector for explanations for single-value metrics

info_point_metrics

a vector for explanations for point metrics

function_call

function call

Note

The tool has been tested using Windows OS and might not work for Linux or Mac

Author(s)

Jaakko Madetoja

References

Madetoja, J. (2018). Error propagation in geographically weighted regression. (Doctoral dissertation, Aalto University). Manuscript in preparation.

See Also

print_histograms, print_maps, plot_boxplots

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
## Not run: 
## An example with artificial data
size <- 20 # size of the square; there will be (size+1)^2 points
u <- rep(0:size, times=size+1) # x coordinate, runs like 0, 1, 2, 0, 1, 2, 0, 1, 2...
v <- rep(0:size, each=size+1) # y coordinate, runs like 0, 0, 0, 1, 1, 1, 2, 2, 2...

# Beta's:
b0 <- 0 # Skip intercept
b1 <- 0.5 # Constant coefficient
b2 <- (u + v) / (max(u) + max(v)) # Linear coefficient
b3 <- sin((1/max(u))*pi*u) # y-coordinate constant, sin curve in x-direction

# x's and e
RNGkind("L'Ecuyer-CMRG") # Using the same RNG as in simulations
set.seed(42) # Set the seed
x1 <- runif((size+1)^2) # Uniform distribution [0,1]
x2 <- runif((size+1)^2)
x3 <- runif((size+1)^2)
e0 <- rnorm((size+1)^2, mean=0, sd=0.25) # Random residuals

# y
y <- b0 + b1*x1 + b2*x2 + b3*x3 + e0

simul_data <- data.frame(y, x1, x2, x3, b0, b1, b2, b3, e0, u, v)

artificial_results <- epgwr_mc(data_in=simul_data, y_name="y",
x_names=c("x1", "x2", "x3"), coord_names=c("u", "v"), sd_y=0.4,
sd_x_vec=c(0.2, 0.2, 0.2), sd_coord=0, n_sim=10, multicore=2, adapt_in=FALSE,
gweight_in=gwr.bisquare, rho_in = FALSE, neigh_dist = FALSE)

# Create histograms for the single-value metrics
print_histograms(data=artificial_results)
# Create maps for the point metrics
simul_points <- SpatialPointsDataFrame(cbind(u, v), simul_data)
print_maps(data=artificial_results, spatialdataframe=simul_points)

rm(size, u, v, b0, b1, b2, b3, x1, x2, x3, e0, y, simul_data,
artificial_results) # Remove results from the environment

## An example with Georgia data
data(georgia) # Georgia data set from the package spgwr
georgia_table <- gSRDF@data
data(georgia_sd) # Standard deviations for the Georgia data

georgia_results <- epgwr_mc(data_in=georgia_table, y_name="PctBach",
x_names=c("TotPop90", "PctRural", "PctEld", "PctFB", "PctPov", "PctBlack"),
coord_names=c("X", "Y"), sd_y=georgia_sd[["PctBach_sd"]],
sd_x_vec=list(georgia_sd[["TotPop90_sd"]], georgia_sd[["PctRural_sd"]],
georgia_sd[["PctEld_sd"]], georgia_sd[["PctFB_sd"]],
georgia_sd[["PctPov_sd"]], georgia_sd[["PctBlack_sd"]]), sd_coord=0, y_min=0,
y_max=100, x_min=c(0,0,0,0,0,0), x_max=c(9999999, 100, 100, 100, 100, 100),
n_sim=100, multicore=4, adapt_in=FALSE, gweight_in=gwr.bisquare, rho_in =
FALSE, neigh_dist = FALSE)

# Visualize results
print_histograms(data=georgia_results)
print_maps(data=georgia_results, spatialdataframe=gSRDF, print_file=FALSE)


## End(Not run)

jaakkomadetoja/epgwr documentation built on May 28, 2019, 8:57 p.m.