pop.ecdf: Draw an Ecdf plot comparing distributions of scores in...
In ejanalysis/ejanalysis: Tools for Environmental Justice (EJ) Analysis

pop.ecdf

R Documentation

Draw an Ecdf plot comparing distributions of scores in selected demographic groups

Description

Draws a plot using Hmisc::Ecdf(), overlaying cumulative distribution functions, one for each subgroup specified. Useful to compare 2 groups based on each groups entire pdf or cdf distribution of peoples scores, using data from small places like census block groups, based on having for each place the pop total and \

Usage

pop.ecdf(
  scores,
  pcts,
  pops,
  allothers = TRUE,
  col = "red",
  main = "",
  weights,
  subtitles = FALSE,
  ...
)

Arguments

`scores`	Numeric vector (or data.frame) required. Values to analyze. If data.frame, then each column is plotted in its own panel.
`pcts`	Numeric vector (or data.frame), required. Same number of vector elements or data.frame rows as length of scores vector (not sure what happens if pcts and scores are both data.frames). Specifies the fraction of population that is in demographic group(s) of interest, one row per place, one column per group.
`pops`	Vector used to define weights as `pop * pcts`, and if `allothers=TRUE`, for `pop * (1-pcts)` for nongroup
`allothers`	Logical value, optional, TRUE by default. Whether to plot a series for everyone else, using 1-pct
`col`	Optional, default is 'red' to signify line color red for key demographic group. Can also be a vector of colors if pcts is a data.frame with one column per group, one color per group.
`main`	Optional character specifying plot title. Default title notes colors of lines and if reference group used.
`weights`	Not used currently. See `pops` parameter
`subtitles`	Logical FALSE by default, which means extra info is not shown (see help on `Hmisc::Ecdf()`)
`...`	other optional parameters to pass to Ecdf

Details

Notes:
to compare zones,
compare demog groups, (see parameter called group)
compare multiple groups and/or multiple zones, like hisp vs others in us vs ca all on one graph
see Ecdf() for options & try passing a data.frame instead of just vector

Value

draws a plot

Examples

## #
## Not run: 
pop.ecdf( 31:35, c(0.10, 0.10, 0.40, 0, 0.20), 1001:1005 )

set.seed(99)
pctminsim=c(runif(7000,0,1), pmin(rlnorm(5000, meanlog=log(0.30), sdlog=1.7), 4)/4)
popsim= runif(12000, 500, 3000)
esim= rlnorm(12000, log(10), log(1.15)) + rnorm(12000, 1, 0.5) * pctminsim - 1
pop.ecdf(esim, pctminsim, popsim,
 xlab='Tract air pollution levels (vertical lines are group means)',
  main = 'Air pollution levels among minorities (red curve) vs rest of US pop.')
abline(v=wtd.mean(esim, weights = pctminsim * popsim), col='red')
abline(v=wtd.mean(esim, weights = (1-pctminsim) * popsim), col='black')

pop.ecdf(bg$pm, bg$pctmin, 1000,
 xlab='Tract air pollution levels (vertical lines are group means)',
main = 'PM2.5 levels among minorities (red curve) vs rest of US pop.')
abline(v=wtd.mean(bg$pm, weights = bg$pctmin * bg$pop), col='red')
abline(v=wtd.mean(bg$pm, weights = (1-bg$pctmin) * bg$pop), col='black')

#pop.ecdf(dat$Murder, dat$Population * (dat$Illiteracy/100))
pop.ecdf(bg$pm, bg$pctmin, bg$pop,
 main='PM2.5 levels among minorities (red curve) vs rest of pop (vertical lines=group means)')
abline(v=wtd.mean(bg$pm, weights = bg$pctmin * bg$pop), col='red')
abline(v=wtd.mean(bg$pm, weights = (1-bg$pctmin) * bg$pop), col='black')

pop.ecdf(log10(places$traffic.score), places$pctmin, places$pop)
pop.ecdf(places$cancer, places$pctmin, places$pop, allothers=FALSE)
pop.ecdf(places$cancer, places$pctlingiso, places$pop, col='green', allothers=FALSE, add=TRUE)
# Demog suscept  for each REGION (can't see if use vs others)
pop.ecdf(bg$traffic.score, bg$VSI.eo, bg$pop, log='x', subtitles=FALSE,
         group=bg$REGION, allothers=FALSE,
         xlab='Traffic score (log scale)', ylab='%ile of population',
          main='Distribution of scores by EPA Region')

# Demog suscept (how to show vs others??), one panel per ENVT FACTOR (ie per col in scores df)
data('names.e')
pop.ecdf(bg[ , names.e], bg$VSI.eo, bg$pop, log='x', subtitles=FALSE,
         allothers=TRUE, ylab='%ile of population',
          main='Distribution of scores by EPA Region')

# log scale is useful & so are these labels passed to function
# in CA vs not CA
pop.ecdf(bg$traffic.score, bg$ST=='CA', bg$pop,
         subtitles=FALSE,
         log='x', xlab='%ile of population', ylab='Traffic scores (log scale)',
         main='Distribution of scores in CA (red) vs rest of US')

# Flagged vs not (all D, all zones)
pop.ecdf(bg$traffic.score, bg$flagged, bg$pop, log='x')

# D=Hispanics vs others, within CA zone only
pop.ecdf(bg$traffic.score, bg$ST=='CA', bg$pop * bg$pcthisp, log='x')
# Demog suscept vs others, within CA only
pop.ecdf(bg$traffic.score, bg$ST=='CA', bg$pop * bg$VSI.eo, log='x')


## End(Not run)

ejanalysis/ejanalysis documentation built on April 2, 2024, 10:12 a.m.