dataDiscretize: Discretize data

Description Usage Arguments Details Value Examples

View source: R/dataDiscretize.R

Description

These functions discretize continuous input data into classes. Classes can be defined by the user or, if the user provides the number of expected classes, calculated from quantiles (default option) or by equal intervals.
dataDiscretize processes a single variable at a time, provided as vector. bulkDiscretize discretizes multiple input rasters, optionally by using parallel processing.

Usage

1
2
3
4
5
6
7
8
dataDiscretize(
  data,
  classBoundaries = NULL,
  classStates = NULL,
  method = "quantile"
)

bulkDiscretize(formattedLst, xy, inparallel = FALSE)

Arguments

data

numeric vector. The continuous data to be discretized.

classBoundaries

numeric vector or single integer. Interval boundaries to be used for data discretization. Outer values (minimum and maximum) required. -Inf or Inf are allowed, in which case data minimum and maximum will be used to evaluate the mid values of outer classes. Alternatively, a single integer to indicate the number of classes, to split by quantiles (default) or equal intervals.

classStates

vector. The state labels to be assigned to the discretized data.

method

character. What splitting method should be used? This argument is ignored if a vector of values is passed to classBoundaries.

  • quantile splits data into quantiles (default).

  • equal splits data into equally sized intervals based on data minimum and maximum.

formattedLst

A formatted list as returned by linkNode and linkMultiple

xy

matrix. A matrix of spatial coordinates; first column is x (longitude), second column is y (latitude) of locations (in rows).

inparallel

logical or integer. Should the function use parallel processing facilities? Default is FALSE: a single process will be launched. If TRUE, all cores/processors but one will be used. Alternatively, an integer can be provided to dictate the number of cores/processors to be used.

Details

dataDiscretize

Value

dataDiscretize returns a named list of 4 vectors:

bulkDataDiscretize returns a matrix: in columns each node associated to input spatial data, in rows their discretized values at coordinates specified by argument xy.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
s <- runif(30)

# Split by user defined values. Values out of boundaries are set to NA:
dataDiscretize(s, classBoundaries = c(0.2, 0.5, 0.8)) 

# Split by quantiles (default):
dataDiscretize(s, classStates = c('a', 'b', 'c'))

# Split by equal intervals:
dataDiscretize(s, classStates = c('a', 'b', 'c'), method = "equal")

# When -Inf and Inf are provided as external boundaries, $midValues of outer classes
# are calculated on the minimum and maximum values:
dataDiscretize(s, classBoundaries=c(0, 0.5, 1), classStates=c("first", "second"))[c(2,3)]
dataDiscretize(s, classBoundaries=c(-Inf, 0.5, Inf), classStates=c("first", "second"))[c(2,3)]

## Discretize multiple spatial data by location
list2env(ConwyData, environment())

network <- LandUseChange
spatialData <- c(ConwyLU, ConwySlope, ConwyStatus)

# Link multiple spatial data to the network nodes and discretize
spDataLst <- linkMultiple(spatialData, network, LUclasses, verbose = FALSE)
coord <- aoi(ConwyLU, xy=TRUE)
head( bulkDiscretize(spDataLst, coord) )

dariomasante/bnspatial documentation built on Aug. 25, 2020, 4:07 p.m.