proxistat.chunked: Call proxistat once per chunk & save output as file (breaks...

View source: R/proxistat.chunked.R

proxistat.chunkedR Documentation

Call proxistat once per chunk & save output as file (breaks large input data into chunks)

Description

Call proxistat function in chunks, when list of frompoints is so long it taxes RAM (e.g. 11m blocks), saving each chunk as a separate .RData file in current working directory

Usage

proxistat.chunked(
  frompoints,
  topoints,
  fromchunksize,
  tochunksize,
  startchunk = 1,
  FUN = proxistat,
  folder = getwd(),
  savechunks = FALSE,
  assemble = TRUE,
  saveproxistats = FALSE,
  area,
  file = "proxistats.RData",
  ...
)

Arguments

frompoints

Require matrix or data.frame of lat/lon vauels that can be passed to get.distances function (colnames 'lat' and 'lon')

topoints

Require matrix or data.frame of lat/lon vauels that can be passed to get.distances function (colnames 'lat' and 'lon')

fromchunksize

Required, number specifying how many points to analyze at a time (per chunk).

tochunksize

(not currently required - current default is to use all topoints at once) number specifying how many points to analyze at a time (per chunk).

startchunk

Optional integer defaults to 1. Specifies which chunk to start with, in case some already have been done. Currently, still must pass entire dataset to this function even if some of the earlier chunks have already been analyzed.

FUN

Optional function, proxistat by default, and other values not implemented yet.

folder

Optional path specifying where to save .RData file(s) – chunk-specific files and/or assembled results file – default is getwd()

savechunks

Optional logical defaults to FALSE. Specifies whether to save .RData file of each chunk

assemble

Optional logical defaults to TRUE. Specifies whether to assemble all chunks into one variable called proxistats, which is saved as file in folder and returned by this function.

saveproxistats

Optional logical defaults to FALSE. Specifies whether to save .RData file of assembled results as proxistats matrix. Ignored if assemble=FALSE.

area

Optional number or vector of numbers giving size of each spatial unit with FIPS.pop, in square miles or square kilometers depending on the units parameter. Optional. Default is to pass nothing to proxistat, and default there is 0, in which case no adjustment is made for small or even zero distance, which can cause unrealistically large or even infinite/undefined scores. For zero distance if area=0, Inf will be returned for the score.

file

Optional name of file created if assemble=TRUE and saveproxistats=TRUE, defaults to proxistats.RData using save(proxistats, 'proxistats.RData')

...

Other parameters to pass to proxistat such as units or wts

Details

*** Still slow for all blocks in USA & 10k topoints (several hours) Filesizes:

80MB file/chunk if 1k blocks x 11k topoints/chunk: y=get.distances.chunked(testpoints(11e6), testpoints(11000), 1e3, units='km')

800MB file/chunk if 10k blocks x 11k topoints/chunk: y=get.distances.chunked(testpoints(11e6), testpoints(11000), 1e4, units='km')

Value

If assemble=TRUE, returns assembled set of all chunks as matrix of 1 or more columns. If assemble=FALSE but savechunks=TRUE, returns vector of character elements that are filenames for saved .RData output files in current working directory or specified folder. Each saved output is a vector of proximity scores if FUN=proxistat, or matrix with extra columns depending on return. parameters above. Otherwise, returns NULL.


ejanalysis/proxistat documentation built on April 2, 2024, 10:13 a.m.