incrementalLID: Incremental Local Indicators of Dispersion

View source: R/incrementalLID.R

incrementalLIDR Documentation

Incremental Local Indicators of Dispersion

Description

Determine the bandwidth that maximizes the non-group component of inequality.

Usage

incrementalLID(
  x,
  dist,
  bws = Inf,
  def.neigh = 0,
  offset = function(x) 2 * x,
  n = rep(1, length(x)),
  ntrials = 50,
  alpha = 0.05,
  standard = NULL,
  expect = NULL,
  mode = "adaptive",
  weighting = "membership",
  FUN = NULL,
  inf.val = NULL,
  row.stand = "fuzzy",
  minval = 50,
  var.stand = FALSE,
  var.exp = FALSE,
  ng.invert = TRUE,
  max.cross = .Machine$integer.max,
  pb = TRUE,
  ...
)

Arguments

x

A vector of weights with the same length as x

dist

A matrix or distance object representing pairwise distances. The distances need not be symmetrical.

bws

A vector containing the representing the bandwidth within neighbors are considered. If mode = 'adaptive', bw is the number of nearest neighbors. If mode = 'fixed', bw is the radius of the window in the map units.

def.neigh

Numeric. At what distance (in the map units) are observations definitely neighbors? All distances are subtracted by this value, and all resulting distances less than zero are reassigned to minval.

offset

What value is added to the denominator to prevent singularities from arising (e.g. whenever the value is 1/0)? Larger values imply smaller distance-decay. This should be a numeric of length one or length(def.neigh). Alternatively, offset can be expressed as a function of def.neigh. Default is offset = function(x) 2 * x. Ignored if x is a vector.

n

A vector representing population weights. How much of an impact does a given observation have on any other observation regardless of its influence as provided for in w. Default is 1 for all.

ntrials

The number of permutations to perform. Default is 50.

alpha

Threshold for significance. Default is alpha = 0.05.

standard

The standards matrix with dimensions length(x) x length(x) used when calculating lid. Ignored if none had been originally provided, otherwise required.

expect

The expectations matrix with dimensions length(x) x length(x) used when calculating lid. Ignored if none had been originally provided, otherwise required.

mode

One of 'adaptive', which considers a bw number of nearest neighbors; or 'fixed', which considers a fixed bandwidth window of radius bw.

weighting

One of 'membership', which considers binary membership such that neighbors are weighted 1 and non-neighbors 0; 'distance' which weighs neighbors according to FUN with the raw distance matrix providing the distance; or 'rank' which uses the rank-distance (i.e. 1 for nearest neighbor, 2 for second nearest...) as the distance variable.

FUN

The distance function. Default is NULL for 'membership', and function(x) offset/(offset + x) otherwise. Ignored if x is a vector.

inf.val

When singularities arise, (i.e. whenever the value is 1/0), by what value are they replaced? Default is the FUN of the lowest non-minval value. Ignored if x is a vector.

row.stand

Logical or 'fuzzy'. If TRUE (the default), rows are standardized such that they sum to one. If 'fuzzy', rows are standardized as a proportion of the largest value.

minval

When distances are raw, what is the minimum allowable distance? Default is 0. Ignored if x. Use this if you don't want to offset values otherwise.

var.stand

Logical. Should the standards be permuted if a matrix was provided? Default is FALSE.

var.exp

Logical. Should the expectations be permuted if a matrix was provided? Default is FALSE.

ng.invert

Does a higher non-group value imply higher between group inequality? Default is TRUE. This is ignored if matrixes were not originally provided, as it is automatically performed.

max.cross

When processing, what is the maximum number of rows that an internal data.table can have? This is generally not a concern unless the number of observations approaches sqrt(.Machine$integer.max)–usually about 2^31 for most systems. Lower values result in a greater number of chunks thus allowing larger data.sets to be calculated.

pb

Logical. Should a progress bar be displayed? Default is FALSE, although if a large dataset is processed that requires adjusting max.cross this can be useful

...

Additional parameters to pass on to LID.

Value

A list with three entries:

(1) index A named character with the code of the index named by its name

(2) $bws The bandwidths that significantly optimize the non-group inequality. Generally, a neighborhood is the first significant peak.

(3) $stats A data.table with the global group, non-group, and total values for each bandwidth, as well as a column indicating whether or not it's significant.

Examples


# Generate dummy observations
x <- runif(10, 1, 100)

# Get distance matrix
dists <- dist(x, upper = TRUE, diag = TRUE)

# Bandwidth sizes from 3 to 5
bws <- 3:6

inc <- incrementalLID(x, dist = dists, bws = bws, index = 'gini', type = 'local',
                      weighting = 'distance', FUN = function(x) 1/x^2, minval = 1)

andresgmejiar/lbmech documentation built on Feb. 2, 2025, 12:37 a.m.