write.wtd.pctiles.by.zone: create lookup table as file of pop-weighted or unwtd...

View source: R/write.wtd.pctiles.by.zone.R

write.wtd.pctiles.by.zoneR Documentation

create lookup table as file of pop-weighted or unwtd percentiles, mean, sd for US or by state or region

Description

check which functions actually get used for this now.

Usage

write.wtd.pctiles.by.zone(
  mydf,
  wts = NULL,
  filename = NULL,
  zone.vector = NULL,
  zoneOverallName = "USA"
)

Arguments

mydf

data.frame with numeric data. Each column will be examined to calculate mean, sd, and percentiles, for each zone.

wts

optional vector of numbers such as population counts as weights, as long as nrow(mydf)

filename

prefix to use for filename to be saved locally (.csv is added by the function). If not provided, no file is saved.

zone.vector

optional names of states or regions, for example. same length as wts, or rows in mydf

zoneOverallName

optonal If not by zone, name of entire domain to use in table column called REGION. Default is USA.

Details

also see:
ejscreen.lookuptables ??

make.bin.pctile.cols uses assign.pctiles

write.wtd.pctiles.by.zone uses wtd.pctiles.exact or pctiles.exact
pctiles.exact uses stats::quantile(x, type = 1, probs = (1:100)/100, na.rm = TRUE)) The inverse of quantile is ecdf (empirical cumulative distribution function) a step function with jumps i/n at observation values, where i is the number of tied observations at that value. Missing values are ignored.

For observations x= (x1,x2, ... xn), Fn is the fraction of observations less or equal to t, i.e.,

Fn(t) = #xi <= t/n = 1/n sum(i=1,n) Indicator(xi <= t).

stats::quantile() can use nine different quantile algorithms discussed in Hyndman and Fan (1996), – Hyndman, R. J. and Fan, Y. (1996) Sample quantiles in statistical packages, American Statistician 50, 361–365. doi: 10.2307/2684934. defined as weighted averages of consecutive order statistics.

type 1 is used here, and is "Inverse of empirical distribution function."

  Sample quantiles of type i are defined by:

               Q[i](p) = (1 - Y) x[j] + Y x[j+1],

               i = type of formula to use (1 through 9).
               p is the percentage (0 through 1).
               x[j] is the jth order statistic,
               (j-m)/n less than or equal to p < (j-m+1)/n,
               n is the sample size, the value of
               Y is a function of
                 j = floor(np + m)     --so this is roughly how many data points are smaller.
                 g =       np + m - j,  --so this is roughly
                 m = a constant determined by the sample quantile type. (m= 0 or -0.5 here)
    Discontinuous sample quantile types 1, 2, and 3

    For types 1, 2 and 3, Q[i](p) is a discontinuous function of p, with
    m = 0 when i = 1 or i = 2, and m = -1/2 when i = 3.
    Type 1 =  Inverse of empirical distribution function. Y = 0 if g = 0, and 1 otherwise.

wtd.pctiles.exact uses
  Hmisc::wtd.quantile(x, wts, type = "i/n", probs = (1:100)/100) ,  na.rm = na.rm))
  "i/n" uses the inverse of the empirical distribution function,
  using  wt/T, where wt is the cumulative weight and T is the total weight (usually total sample size).

table.pop.pctile and

map service with lookup tables



Examples


  # bg = ejscreen::bg21plus # want demog subgroups but also want PR eventually
  # pctilevariables <- c(names.e, names.d, names.d.subgroups, names.ej)
  # ejanalysis::write.wtd.pctiles(mydf = bg[ , pctilevariables], wts = bg$pop, filename =  'lookupUSA21')
  # ejanalysis::write.wtd.pctiles.by.zone(mydf = bg[ , pctilevariables], wts = bg$pop,
  #                                 zone.vector = bg$REGION, filename =  'lookupRegions21')
  # ejanalysis::write.wtd.pctiles.by.zone(mydf = bg[ , pctilevariables], wts = bg$pop,
  #                                 zone.vector = bg$ST,     filename =  'lookupStates21')


ejanalysis/ejanalysis documentation built on April 2, 2024, 10:12 a.m.