percentileSeek: Find percentile values (to match a set of global...

Description Usage Arguments Details Value

Description

percentileSeek returns a set of percentiles to be applied across subunits (e.g. ZIP codes) of a larger area (e.g. a jurisdiction), so as to rank items within each subunit (e.g. restaurants) and group these items into grade categories. percentileSeek allows the user to set the desired global proportion of items in each grade category.

Usage

1
2
percentileSeek(scores, z, desired.props, restaurant.tol = 10,
  max.iterations = 20, resolve.ties = FALSE)

Arguments

scores

Numeric vector of size n, where n is the number is restaurants to be graded. scores[i] represents the mean or raw inspection score for restaurant i.

z

Character vector representing ZIP codes. z[i] is the ZIP code for restaurant i.

desired.props

Numeric vector representing desired global grade proportions across the entire jurisdiction. desired.props[j] is the desired proportion of total (gradeable) restaurants in the jth highest grading category.

restaurant.tol

Integer value representing the maximum difference in number of restaurants suggested by desired.props and the actual number of restaurants in each of the top (length(desired.props) - 1) grade categories.

max.iterations

Integer value specifying the maximum number of calls of the updateGamma percentile update function for each of the sought after percentiles.

resolve.ties

Boolean value specifying interpretation of how the function's returned percentiles will be applied across subunits. Should as close to (desired.props[1])% of restaurants in a ZIP code receive an "A" grade, and as close to (desired.props[2])% of restaurants in a ZIP code receive "B" grades (resolve.ties = TRUE case)? Or should the returned percentiles be interpretted as R quantile Type = 1 percentiles, and at least (desired.props[1])% of restaurants in a ZIP code receive "A" grades?

Details

In our documentation, we use the language “ZIP code” and “restaurant”, however, our algorithms and code can be applied much more broadly to other inspected or scored entities; and percentile cutoffs can be sought in subunits (of a larger area) that are not ZIP codes. Where “ZIP code” is referenced, please read “ZIP code or other subunit of a larger area” and “restaurant” should read “restaurant or other entity to be graded”.

percentileSeek was designed for situations in which a significant number of ties in the scores of items within subunits (e.g. ties in restaurant inspection scores in ZIP codes) result in the obvious choice of percentiles (namely those obtained from the desired proportions) not yielding the desired proportions globally. percentileSeek will iterate over different values for the first percentile (using the update process described in the updateGamma documentation) until the proportion of (gradeable) restaurants scoring “A” grades (when ZIP cutoffs are percentile values) is within (restaurant.tol/ no.gradeable.rests) of the desired proportion of As, where no.gradeable.rests is the number of gradeable restaurants, and gradeable restaurants are those that have both ZIP code and inspection score information. The algorithm will then seek to find a larger percentile to match the proportion of gradeable restaurants scoring “B” grades with the desired proportion of Bs and so on, until the proportions of restaurants gaining the top (lengh(desired.props) - 1) grades are within the required tolerance of their desired proportions. Note: there is thus no requirement that the proportion of restaurants gaining the worst grade matches the desired proportion for worst grade - these can be quite different (depending on the number of restaurants being graded and the number of grade categories) and no error will be reported.

Of course, percentileSeek can only find a solution if one exists. It could be the case that it is simply not possible with a particular set of scores to match the desired proportions. We have included some failsafes to catch some of the simplest instances in which no solution will exist. For instance, one possible reason for failure is selecting a desired proportion of “A” grades that is below the global minimum proportion of “A”s. Totaling the number of restaurants with the best inspection scores in their ZIP codes and dividing by the number of gradeable restaurants provides the global minimum proportion of “A”s. Running percentileSeek can be a useful way to test whether a solution is likely to exist. If reported results of the percentileSeek function are outwith the standard [0, 1] interval for percentiles, or if the number of iterations exceeds the maximum number of iterations, this could be indicative that no solution exists.

An example of when the percentileSeek function could be used outside of the restaurant context is if you were tasked with finding the top 3 percent of students in a state. We know that each school has its own GPA system and so comparing students by raw GPA does not make sense. We could thus desire to perform a percentile adjustment at each school and select the top 3 percent of students at each school. Unfortunately, some schools do not utilize the full spectrum of GPA scores available and so it may be the case that the top 5 percent of students at school 1 have the same GPA and cannot be distinguished from one another. Using percentileSeek with each restaurant replaced by a student, each restaurant's inspection score replaced by the student's GPA and each ZIP code replaced by a school, we could investigate whether it is possible to satisfy the 3 percent globally desired proportion. percentileSeek would reduce the percentile applied across schools (from the initial 3 percent), which would still select the 5 percent of students at school 1 for nomination, but would try to take advantage of the fact that some schools do use more of their GPA scale. Of course, issues of fairness do arise and one wonders why school 2, which distinguishes its students better than school 1, should have fewer students represented in the globally selected 3 percent. We only advocate the use of percentileSeek for situations in which there is good reason to demand certain global proportions. In the school selection case, this may be that there are only finite resources available to be given to the top 3 percent of students and it is simply not possible to extend these resources to the top 3 percent of students at each school. In the restaurant case, we desire to select the top restaurants in each ZIP code to be assigned an 'A' grade; however we also do not want to design a grading system that is seen to inflate grades compared to an unadjusted grading system (one based on absolute uniform grade cutoffs across the whole jurisdiction).

Value

A numeric vector with the percentiles to be applied to each ZIP code so as to achieve the desired proportion of grades.


QuantileGradeR documentation built on May 2, 2019, 6:41 a.m.