assign.pctiles: Assign percentiles to vector of values (weighted, by zone)

View source: R/assign.pctiles.R

assign.pctilesR Documentation

Assign percentiles to vector of values (weighted, by zone)

Description

For the vector, look at the distribution of values across all rows in a given zone (e.g., places in zone), and find what percentile a given value is at.

Usage

assign.pctiles(values, weights = NULL, zone = NULL, na.rm = TRUE)

Arguments

values

vector, required, with numeric values. To do this with a matrix, see make.pctile.cols()

weights

Optional, NULL by default (not fully tested), vector of weights for weighted percentiles (e.g., population weighted).

zone

Optional, NULL by default, defines subsets of rows, so a percentile is found among rows that are within a given zone only.

na.rm

NOT IMPLEMENTED HERE. Logical, optional, TRUE by default. Should NA values (missing data) be removed first to get percentile of those with valid data. If FALSE, NA values are treated as being at the high percentiles.

Details

Relies on the Hmisc::wtd.Ecdf() function. COULD BE RECODED TO BE FASTER USING data.table package *** see notes in rollup.pct() and rollup() Could also add parameter like in rank(), na.last, defining na.rm but also where to rank NA values if included, etc.

Default now is like na.last=NA, but like na.last='last' if na.rm=FALSE

****Could also add parameter like in rank(), ties.method, defining if ties get min, max, or mean of percentiles initially assigned to ties. Default for ties up to mid2022 was like ties.method=max ? But EJScreen will redefine percentiles to set tied values at lower end of that group, min

Value

Returns a numeric vector same size as x, but if zone is specified, provides percentile with given zone.

See Also

make.bin.pctile.cols() to call functions below, converting columns of values to percentiles and then bins assign.pctiles() for one vector, assign (weighted) percentile (quantile) to each value within its zone (subset) assign.pctiles.alt2() as an alternative method, to replicate assign.pctiles, but not by zone get.pctile() to get (weighted) percentile of just 1+ values within given vector of values make.pctile.cols() for a data.frame, assign percentiles, return a same-sized df that is wtd.quantile of each value within its column make.pctile.cols.alt2() as an alternative method, to replicate make.pctile.cols assign.map.bins() for one vector (or data.frame) of values (e.g. percentiles), return same-sized df that is bin number (map color bin) using preset breaks. make.bin.cols() for a data.frame of values (e.g. percentiles), return same-sized df that is bin number (map color bin) using preset breaks. write.pctiles() to save file that is lookup table of percentiles for columns of a data.frame write.pctiles.by.zone() to save file that is lookup table of percentiles for columns of a data.frame, for each geographic zone (subset of rows) write.wtd.pctiles() to save file that is lookup table of weighted percentiles for columns of a data.frame write.wtd.pctiles.by.zone() to save file that is lookup table of weighted percentiles for columns of a data.frame, for each geographic zone (subset of rows) lookup.pctile() to look up current approx weighted percentiles in a lookup table that is already in global memory

Examples

x <- c(30, 40, 50, 12,12,5,5,13,13,13,13,13,8,9,9,9,9,9,10:20,20,20,20,21:30)
wts <- rep(c(2,3), length(x)/2)
cbind(wts, x, PCTILE=assign.pctiles(x,wts))

# PERCENTILE OF ALL, NOT JUST THOSE WITH VALID DATA, IF na.rm=FALSE,
# but then NA values preclude high percentiles:
x <- c(NA, NA, NA, NA,NA,NA,NA,NA,NA,NA,13,13,8,9,9,9,9,9,10:20,20,20,20,21:30)
wts <- rep(c(2,3), length(x)/2)
cbind(wts, x, PCTILE.alt2=assign.pctiles.alt2(x, wts, na.rm=FALSE),
 pctile=assign.pctiles(x,wts))[order(x),]
cbind(wts, x, PCTILE.alt2=assign.pctiles.alt2(x, wts, na.rm=TRUE),
 pctile=assign.pctiles(x,wts))[order(x),]

V=9
sum(wts[!is.na(x) & x <= V]) / sum(wts[!is.na(x)])

#A value (V) being at this PCTILE% means that (assuming na.rm=TRUE):

# V >= x  for        PCTILE% of wts     (for non-NA x), so
# V < x   for 100% - PCTILE% of wts     (for non-NA x), or
# PCTILE% of all wts have V >= x (for non-NA x), so
# 100% - PCTILE% of all wts have V < x  (for non-NA x).

x <- c(32, NA, NA, NA,NA,NA,NA,NA,NA,NA,13,13,8,9,9,9,9,9,10:20,20,NA,20,21:30)
wts <- rep(c(2,3), length(x)/2)
cbind(wts, x, PCTILE.alt2=assign.pctiles.alt2(x, wts, na.rm=FALSE),
 pctile=assign.pctiles(x,wts))[order(x),]
cbind(wts, x, PCTILE.alt2=assign.pctiles.alt2(x, wts, na.rm=TRUE),
 pctile=assign.pctiles(x,wts))[order(x),]

 ## Not run: 
   What is environmental score at given percentile?
 ejanalysis::lookup.pctile(40,'cancer',lookupUSA)
 # [1] 84
 ejanalysis::lookup.pctile(40,'cancer',lookupStates,'WV')
 # [1] 93
 #    What is percentile of given environmental score?
 ejscreen::lookupUSA[lookupUSA$PCTILE=='84' ,'cancer']
 # [1] 39.83055
 ejscreen::lookupStates[lookupStates$PCTILE=='84' & lookupStates$REGION =='WV','cancer']
 # [1] 33.36371
 
## End(Not run)


ejanalysis/ejanalysis documentation built on April 2, 2024, 10:12 a.m.