Description Usage Arguments Details Modes Warning Examples
findCutoffs
creates a data frame of cutoff values to be applied across
a jurisdiction's subunits (e.g. ZIP codes) for grading restaurants or other
inspected entities.
1 2 | findCutoffs(X, z, gamma, type = "adj", restaurant.tol = 10,
max.iterations = 20, resolve.ties = TRUE)
|
X |
Numeric matrix of size |
z |
Character vector of length |
gamma |
Numeric vector representing absolute grade cutoffs or
percentiles, depending on |
type |
A character string that is one of |
restaurant.tol |
Only relevant in the |
max.iterations |
An integer only relevant in the |
resolve.ties |
Boolean value only relevant in the |
In our documentation, we use the language ”ZIP code” and ”restaurant”, however, our grading algorithm and our code can be applied to grade other inspected entities; and percentile cutoffs can be sought in subunits of a jurisdiction that are not ZIP codes. For example, it may make sense to search for percentile cutoffs in an inspector's allocated inspection area or within a census tract. We chose to work with ZIP codes in our work because of the fact that area assignments for inspectors in King County (WA) tend to be single or multiple ZIP codes, and we desired to assign grades based on how a restaurant's scores compare to other restaurants assessed by the same inspector. We could have calculated percentile cutoffs in an inspector's allocated area, but we also desired to create a grading system that was readily explainable, and the process for allocating an area to an inspector is non-trivial. Where ”ZIP code” is referenced, please read ”ZIP code or other subunit of a jurisdiction” and ”restaurant” should read ”restaurant or other entity to be graded”.
findCutoffs
takes in a matrix of restaurants' scores and a vector
corresponding to restaurants' ZIP codes, and outputs a data frame of cutoff
scores to be used in grade classification. The returned ZIP code cutoff data
frame has one row for each unique ZIP code and has (length(gamma)+1)
columns, corresponding to one column for the ZIP code name, and
(length(gamma))
cutoff scores separating the (length(gamma)+1)
grading categories. Across each ZIP code's row, cutoff scores increase and we
assume, as in the King County (WA) case, that greater risk is associated with
larger inspection scores. (If the inspection system of interest associates
larger inspection scores with reduced risk, it will be necessary to perform a
transformation of inspection scores before utilizing findCutoffs
, or
other functions in the DineSafeR
package. However a simple function
such as f(score) = - score
would perform the necessary transformation
on the input score matrix.)
The returned ZIP code data frame can be used with the
gradeAllBus
function to assign each restaurant a grade: a
restaurant's most recent or mean inspection score is compared to the cutoff
values in its ZIP code. We find the smallest cutoff value that the
restaurant's score is less than or equal to, and we find the index of this
cutoff when the ZIP code's cutoffs are ordered from smallest to largest. This
index will then index the alphabet in order to return the letter grade to be
assigned to the restaurant in question. If the restaurant's score is greater
than all cutoff scores, it is assigned the (length(gamma)+1)
th letter
of the alphabet as its letter grade - the worst grade in the grading scheme.
The way in which cutoff scores are calculated for each ZIP code depends on the
value of the type
variable. The type
variable can take one of
four values (see later) and the default value of type
is set to
type = "adj"
.
type = "unadj"
creates a ZIP code cutoff data frame
with the same cutoff scores (meaningful values in a jurisdiction's
inspection system that are contained in the vector gamma
) for all ZIP
codes. This ZIP code data frame can then be used to carry out ”unadjusted”
grading, in which a restaurant's most recent routine inspection score is
compared to these cutoffs.
type = "perc"
takes in a vector of percentiles,
gamma
, and returns a data frame of the scores in each ZIP code
corresponding to these percentiles (using R Type = 1 definition of
quantile
).
type = "perc.resolve.ties"
takes in a vector of
percentiles, gamma
, and instead of returning (for B/C cutoffs) the
scores in each ZIP code that result in at least (gamma[2]
x
100)% of restaurants in the ZIP code scoring less than or equal to these
cutoffs, type = "perc.resolve.ties"
takes into account the fact that
ties exist in ZIP codes. Returned scores for A/B cutoffs are those that
result in the closest percentage of restaurants in the ZIP code
scoring less than or equal to the A/B cutoff to the desired percentage,
(gamma[1]
x 100)%. Similarly, B/C cutoffs are the scores in the ZIP
code that result in the closest percentage of restaurants in the ZIP
code scoring less than or equal to the B/C cutoff and more than the A/B
cutoff to the desired percentage, ((gamma[2] - gamma[1])
x 100)%.
type = "adj"
takes in a vector of uniform absolute
cutoff scores, gamma
, and, in the first instance, carries out
unadjusted grading by comparing restaurants' most recent routine inspection
scores to these cutoffs (see: type = "unadj"
). Grade proportions in
this scheme are then used as initial percentiles to find percentile cutoffs
in each ZIP code (or percentile cutoffs accommodating for the presence of
score ties in the ZIP code, depending on the value of resolve.ties
;
see: type = "perc"
or type = "perc.resolve.ties"
). Restaurants
are then graded with the ZIP code percentile cutoffs, and grading
proportions are compared with grading proportions from the unadjusted
system. Percentiles are iterated over (by the percentileSeek
function) until grading proportions with ZIP code percentile cutoffs are
within a certain tolerance (as determined by restaurant.tol
) of the
unadjusted grading proportions.
findCutoffs
will produce cutoff scores even for ZIP
codes with only one restaurant - situations in which a percentile adjustment
shouldn't be used. It is the job of the user to ensure that, if using the
findCutoffs
function in mode type = "perc"
, type =
"perc.resolve.ties"
or type = "adj"
, it makes sense to do so. This
may involve only performing the percentile adjustment on larger ZIP codes
and providing absolute cutoff points for smaller ZIP codes, or may involve
aggregating smaller ZIP codes into a larger geographical unit and then
performing the percentile adjustment on the larger area.
As mentioned previously, findCutoffs
was created for
an inspection system that associates greater risk with larger inspection
scores. If the inspection system of interest associates greater risk with
reduced scores, it will be neccessary to perform a transformation of the
scores matrix before utilizing the findCutoffs
function. However a
simple function such as f(score) = - score
would perform the
necessary transformation.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 | # Adjusted Grading (without ties resolution):
zipcode.cutoffs.df <- findCutoffs(X.kc, zips.kc, gamma = c(0, 30))
mean.scores <- rowMeans(X.kc, na.rm = TRUE)
adj.grades <- gradeAllBus(mean.scores, zips.kc, zipcode.cutoffs.df)
# Adjusted Grading (with ties resolution):
cutoffs.Ties.df <- findCutoffs(X.kc, zips.kc, gamma = c(0, 30), resolve.ties = TRUE)
grades.Ties <- gradeAllBus(mean.scores, zips.kc, cutoffs.Ties.df)
# Unadjusted Grading:
unadj.cutoffs <- findCutoffs(X.kc, zips.kc, gamma = c(0, 30), type = "unadj")
unadj.grades <- gradeAllBus(scores = X.kc[,c(1)], zips.kc, zip.cutoffs = unadj.cutoffs)
# Proportion A/B/C in each ZIP code
# Unadjusted
foo1 <- round(t(table(unadj.grades, zips.kc))/apply(table(unadj.grades, zips.kc), 2, sum), 2)
# Adjusted (with ties resolution)
foo2 <- round(t(table(adj.grades, zips.kc))/apply(table(adj.grades, zips.kc), 2, sum), 2)
# Adjusted (without ties resolution)
foo3 <- round(t(table(grades.Ties, zips.kc))/apply(table(grades.Ties, zips.kc), 2, sum), 2)
# Correlation plots of unadjusted vs. adjusted (with resolution of ties) grade proportions
# in ZIP codes for different grades
# Proportions A
plot(foo1[,1], foo2[,1], xlim=range(cbind(foo1[,1],foo2[,1])),
ylim=range(cbind(foo2[,1],foo1[,1])), pch=16,
cex=sqrt(apply(table(adj.grades, zips.kc), 2, sum)/pi)*0.3,
main = "Proportion A in ZIP Codes",
xlab = "Unadjusted", ylab = "Adjusted")
# Proportions B
plot(foo1[,2], foo2[,2],xlim=range(cbind(foo1[,2],foo2[,2])),
ylim=range(cbind(foo2[,2],foo1[,2])),pch=16,
cex=sqrt(apply(table(adj.grades,zips.kc),2,sum)/pi)*0.3,
main = "Proportion B in ZIP Codes", xlab = "Unadjusted", ylab = "Adjusted")
# Proportions C
plot(foo1[,3], foo2[,3],xlim=range(cbind(foo1[,3],foo2[,3])),
ylim=range(cbind(foo2[,3],foo1[,3])),pch=16,
cex=sqrt(apply(table(adj.grades,zips.kc),2,sum)/pi)*0.3,
main = "Proportion C in ZIP Codes", xlab = "Unadjusted", ylab = "Adjusted")
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.