epgwr_mc: Error propagation in geographically weighted regression using...
In jaakkomadetoja/epgwr: Error Propagation in Geographically Weighted Regression

Description Usage Arguments Value Note Author(s) References See Also Examples

View source: R/simulation.R

This function applies error propagation in a given geographically weighted regression model using Monte Carlo simulation

epgwr_mc(data_in, y_name, x_names, coord_names = c("X", "Y"), sd_y, sd_x_vec,
  sd_coord, y_min = FALSE, y_max = FALSE, x_min = FALSE, x_max = FALSE,
  n_sim, multicore = 4, adapt_in = FALSE, bw_in = FALSE,
  gweight_in = gwr.bisquare, rho_in = FALSE, neigh_dist = FALSE,
  normalize = TRUE, skip_F123 = FALSE, seed = 42,
  print_progress = FALSE)

`data_in`	data for the GWR model as a (data.frame) table, a SpatialPointsDataFrame or a SpatialPolygonsDataFrame object with columns as metric coordinates; for Spatial*DataFrame objects, only the data table is used, everything else is ignored
`y_name`	name of the independent variable; for example y_name="PctBach"
`x_names`	vector of names of the depedent variables; needs to be the same size as sd_x_vec; for example x_names=c("TotPop90", "PctRural", "PctEld", "PctFB", "PctPov", "PctBlack")
`coord_names`	vector of X and Y coordinate names in the data_in table (this is also required for the Spatial*DataFrame data table!); defaults are "X" and "Y"; this is case-sensitive!; for example coord_names=c("X", "Y")
`sd_y`	standard deviation of dependent variable; can also be a vector of different deviations for different points; for example sd_y=0.5 or sd_y=c(0.5, 0.7, 0.5, 0.2, 0.5, 0.8, 0.2, 0.5, 0.4, 0.9)
`sd_x_vec`	vector of standard deviations of independent variables; can also be a list of vectors of different deviations for different points; for example sd_x_vec=c(500, 8, 1, 0.05, 1.5, 2) or sd_x_vec=list(500, 8, 1, 0.05, c(0.5, 0.7, 0.5, 0.2, 0.5, 0.8, 0.2, 0.5, 0.4, 0.9), 2)
`sd_coord`	standard deviation of coordinates; can also be a vector of different deviations for different points; for example sd_coord=5000 or sd_coord=c(5000, 7000, 5000, 2000, 5000, 8000, 2000, 5000, 4000, 9000)
`y_min`	minimum value for the dependent variable; use default FALSE if no such values are valid
`y_max`	maximum value for the dependent variable; use default FALSE if no such values are valid
`x_min`	vector of minimum values for independent variables; use default FALSE if no such values are valid; for example x_min=C(0,0,0,0,0,0)
`x_max`	vector of maximum values for independent variables; use default FALSE if no such values are valid; for example x_max=c(999999,1,1,1,1,1)
`n_sim`	number of simulations
`multicore`	number of cores (or actually processes that are run on different cores; default is 4) used in calculation; set to FALSE (or 1) to use just one
`adapt_in`	TRUE for adaptive or FALSE (default) for fixed bandwidth
`bw_in`	FALSE (default) for letting the software calibrate the bandwidth, or a set value to skip bandwidth calibration and use a constant; NOTE: if using adaptive bandwidth, give a value between ]0,1] as the percentage of points used
`gweight_in`	geographical weighting function gwr.Gauss, gwr.gauss or gwr.bisquare (default)
`rho_in`	rho value for autoregressive random values (more than 0 and less than 1) or FALSE (default) if spatial autocorrelation is not used; NOTE: high rho values will distort the distribution (mean values will increase/decrease and sd will increase if neigh_dist is small) of error values for each realization; use normalize=TRUE to fix this problem
`neigh_dist`	distance treshold for autoregressive random values, d_max; increasing this will decrease level of spatial autocorrelation of error values; minimum value for this is the largest nearest neighbor distance after the realizations of errors have been applied to coordinates, so use a small distance for which around each point there are a couple of points
`normalize`	TRUE (default) for normalizing the autocorrelated errors to match the same mean and sd as the non-correlated errors; heavily recommended
`skip_F123`	FALSE (default) for calculating Leung's F123 statistics; TRUE for skipping them as they can take quite a long time to calculate
`seed`	seed for the RNG simulations; default 42; NULL for no seed (hasn't been tested)
`print_progress`	TRUE for printing a message each time a simulation is done, FALSE (default) for skipping it; if multicore=FALSE, printing happens via the normal R console, otherwise the user is asked a directory where a text file will be stored and the messages are printed there; this is because normal printing is not possible with parallel computing in Windows

A list including

`simul_metrics`	a matrix for single-value metrics
`point_metrics`	a matrix for point metrics
`original_simul_metrics`	a vector for original values for single-value metrics
`original_point_metrics`	a vector for original values for point metrics
`info_simul_metrics`	a vector for explanations for single-value metrics
`info_point_metrics`	a vector for explanations for point metrics
`function_call`	function call

The tool has been tested using Windows OS and might not work for Linux or Mac

Jaakko Madetoja

Madetoja, J. (2018). Error propagation in geographically weighted regression. (Doctoral dissertation, Aalto University). Manuscript in preparation.

print_histograms, print_maps, plot_boxplots

## Not run: 
## An example with artificial data
size <- 20 # size of the square; there will be (size+1)^2 points
u <- rep(0:size, times=size+1) # x coordinate, runs like 0, 1, 2, 0, 1, 2, 0, 1, 2...
v <- rep(0:size, each=size+1) # y coordinate, runs like 0, 0, 0, 1, 1, 1, 2, 2, 2...

# Beta's:
b0 <- 0 # Skip intercept
b1 <- 0.5 # Constant coefficient
b2 <- (u + v) / (max(u) + max(v)) # Linear coefficient
b3 <- sin((1/max(u))*pi*u) # y-coordinate constant, sin curve in x-direction

# x's and e
RNGkind("L'Ecuyer-CMRG") # Using the same RNG as in simulations
set.seed(42) # Set the seed
x1 <- runif((size+1)^2) # Uniform distribution [0,1]
x2 <- runif((size+1)^2)
x3 <- runif((size+1)^2)
e0 <- rnorm((size+1)^2, mean=0, sd=0.25) # Random residuals

# y
y <- b0 + b1*x1 + b2*x2 + b3*x3 + e0

simul_data <- data.frame(y, x1, x2, x3, b0, b1, b2, b3, e0, u, v)

artificial_results <- epgwr_mc(data_in=simul_data, y_name="y",
x_names=c("x1", "x2", "x3"), coord_names=c("u", "v"), sd_y=0.4,
sd_x_vec=c(0.2, 0.2, 0.2), sd_coord=0, n_sim=10, multicore=2, adapt_in=FALSE,
gweight_in=gwr.bisquare, rho_in = FALSE, neigh_dist = FALSE)

# Create histograms for the single-value metrics
print_histograms(data=artificial_results)
# Create maps for the point metrics
simul_points <- SpatialPointsDataFrame(cbind(u, v), simul_data)
print_maps(data=artificial_results, spatialdataframe=simul_points)

rm(size, u, v, b0, b1, b2, b3, x1, x2, x3, e0, y, simul_data,
artificial_results) # Remove results from the environment

## An example with Georgia data
data(georgia) # Georgia data set from the package spgwr
georgia_table <- gSRDF@data
data(georgia_sd) # Standard deviations for the Georgia data

georgia_results <- epgwr_mc(data_in=georgia_table, y_name="PctBach",
x_names=c("TotPop90", "PctRural", "PctEld", "PctFB", "PctPov", "PctBlack"),
coord_names=c("X", "Y"), sd_y=georgia_sd[["PctBach_sd"]],
sd_x_vec=list(georgia_sd[["TotPop90_sd"]], georgia_sd[["PctRural_sd"]],
georgia_sd[["PctEld_sd"]], georgia_sd[["PctFB_sd"]],
georgia_sd[["PctPov_sd"]], georgia_sd[["PctBlack_sd"]]), sd_coord=0, y_min=0,
y_max=100, x_min=c(0,0,0,0,0,0), x_max=c(9999999, 100, 100, 100, 100, 100),
n_sim=100, multicore=4, adapt_in=FALSE, gweight_in=gwr.bisquare, rho_in =
FALSE, neigh_dist = FALSE)

# Visualize results
print_histograms(data=georgia_results)
print_maps(data=georgia_results, spatialdataframe=gSRDF, print_file=FALSE)


## End(Not run)