findCutoffs: Find Cutoff Values.

Description Usage Arguments Details Modes Warning Examples

Description

findCutoffs creates a data frame of cutoff values to be applied across a jurisdiction's subunits (e.g. ZIP codes) for grading restaurants or other inspected entities.

Usage

1
2
findCutoffs(X, z, gamma, type = "adj", restaurant.tol = 10,
  max.iterations = 20, resolve.ties = TRUE)

Arguments

X

Numeric matrix of size n x p, where n is the number is restaurants to be graded and p is the number of inspections to be used in grade assignment. Entry X[i,j] represents the inspection score for the ith restaurant in the jth most recent inspection.

z

Character vector of length n representing ZIP codes (or other subunits within a jurisdiction). z[i] is the ZIP code corresponding to the restaurant with inspection scores in row i of X.

gamma

Numeric vector representing absolute grade cutoffs or percentiles, depending on type variable value. Entries in gamma should be increasing, with gamma[1] <= gamma[2] etc (this is related to the "Warning" section and larger scores being associated with higher risk). If type = "perc" or type = "perc.resolve.ties", gamma values represent percentiles and should take on values between 0 and 1.

type

A character string that is one of "adj", "unadj", "perc", or "perc.resolve.ties", and that indicates the grading algorithm to be implemented.

restaurant.tol

Only relevant in the type = "adj" case (the default case). An integer indicating the maximum difference in the number of restaurants in a grading category between the unadjusted and adjusted grading algorithms (for the top length(gamma) grading categories).

max.iterations

An integer only relevant in the type = "adj" case (the default case). The maximum number of iterations that the percentileSeek iterative algorithm should run in order to find percentiles to be applied to ZIP codes to find cutoffs that result in the same global grading proportions (within the restaurant.tol level) as in the unadjusted grading system.

resolve.ties

Boolean value only relevant in the type = "adj" case (the default case). If resolve.ties = TRUE, the intermediate algorithm used to find ZIP cutoffs (after percentileSeek has identified the global percentiles to be applied across all ZIP codes), is the type = "perc.resolve.ties" algorithm. Otherwise the type = "perc" algorithm is used.

Details

In our documentation, we use the language ”ZIP code” and ”restaurant”, however, our grading algorithm and our code can be applied to grade other inspected entities; and percentile cutoffs can be sought in subunits of a jurisdiction that are not ZIP codes. For example, it may make sense to search for percentile cutoffs in an inspector's allocated inspection area or within a census tract. We chose to work with ZIP codes in our work because of the fact that area assignments for inspectors in King County (WA) tend to be single or multiple ZIP codes, and we desired to assign grades based on how a restaurant's scores compare to other restaurants assessed by the same inspector. We could have calculated percentile cutoffs in an inspector's allocated area, but we also desired to create a grading system that was readily explainable, and the process for allocating an area to an inspector is non-trivial. Where ”ZIP code” is referenced, please read ”ZIP code or other subunit of a jurisdiction” and ”restaurant” should read ”restaurant or other entity to be graded”.

findCutoffs takes in a matrix of restaurants' scores and a vector corresponding to restaurants' ZIP codes, and outputs a data frame of cutoff scores to be used in grade classification. The returned ZIP code cutoff data frame has one row for each unique ZIP code and has (length(gamma)+1) columns, corresponding to one column for the ZIP code name, and (length(gamma)) cutoff scores separating the (length(gamma)+1) grading categories. Across each ZIP code's row, cutoff scores increase and we assume, as in the King County (WA) case, that greater risk is associated with larger inspection scores. (If the inspection system of interest associates larger inspection scores with reduced risk, it will be necessary to perform a transformation of inspection scores before utilizing findCutoffs, or other functions in the DineSafeR package. However a simple function such as f(score) = - score would perform the necessary transformation on the input score matrix.)

The returned ZIP code data frame can be used with the gradeAllBus function to assign each restaurant a grade: a restaurant's most recent or mean inspection score is compared to the cutoff values in its ZIP code. We find the smallest cutoff value that the restaurant's score is less than or equal to, and we find the index of this cutoff when the ZIP code's cutoffs are ordered from smallest to largest. This index will then index the alphabet in order to return the letter grade to be assigned to the restaurant in question. If the restaurant's score is greater than all cutoff scores, it is assigned the (length(gamma)+1)th letter of the alphabet as its letter grade - the worst grade in the grading scheme.

The way in which cutoff scores are calculated for each ZIP code depends on the value of the type variable. The type variable can take one of four values (see later) and the default value of type is set to type = "adj".

Modes

type = "unadj" creates a ZIP code cutoff data frame with the same cutoff scores (meaningful values in a jurisdiction's inspection system that are contained in the vector gamma) for all ZIP codes. This ZIP code data frame can then be used to carry out ”unadjusted” grading, in which a restaurant's most recent routine inspection score is compared to these cutoffs.

type = "perc" takes in a vector of percentiles, gamma, and returns a data frame of the scores in each ZIP code corresponding to these percentiles (using R Type = 1 definition of quantile).

type = "perc.resolve.ties" takes in a vector of percentiles, gamma, and instead of returning (for B/C cutoffs) the scores in each ZIP code that result in at least (gamma[2] x 100)% of restaurants in the ZIP code scoring less than or equal to these cutoffs, type = "perc.resolve.ties" takes into account the fact that ties exist in ZIP codes. Returned scores for A/B cutoffs are those that result in the closest percentage of restaurants in the ZIP code scoring less than or equal to the A/B cutoff to the desired percentage, (gamma[1] x 100)%. Similarly, B/C cutoffs are the scores in the ZIP code that result in the closest percentage of restaurants in the ZIP code scoring less than or equal to the B/C cutoff and more than the A/B cutoff to the desired percentage, ((gamma[2] - gamma[1]) x 100)%.

type = "adj" takes in a vector of uniform absolute cutoff scores, gamma, and, in the first instance, carries out unadjusted grading by comparing restaurants' most recent routine inspection scores to these cutoffs (see: type = "unadj"). Grade proportions in this scheme are then used as initial percentiles to find percentile cutoffs in each ZIP code (or percentile cutoffs accommodating for the presence of score ties in the ZIP code, depending on the value of resolve.ties; see: type = "perc" or type = "perc.resolve.ties"). Restaurants are then graded with the ZIP code percentile cutoffs, and grading proportions are compared with grading proportions from the unadjusted system. Percentiles are iterated over (by the percentileSeek function) until grading proportions with ZIP code percentile cutoffs are within a certain tolerance (as determined by restaurant.tol) of the unadjusted grading proportions.

Warning

findCutoffs will produce cutoff scores even for ZIP codes with only one restaurant - situations in which a percentile adjustment shouldn't be used. It is the job of the user to ensure that, if using the findCutoffs function in mode type = "perc", type = "perc.resolve.ties" or type = "adj", it makes sense to do so. This may involve only performing the percentile adjustment on larger ZIP codes and providing absolute cutoff points for smaller ZIP codes, or may involve aggregating smaller ZIP codes into a larger geographical unit and then performing the percentile adjustment on the larger area.

As mentioned previously, findCutoffs was created for an inspection system that associates greater risk with larger inspection scores. If the inspection system of interest associates greater risk with reduced scores, it will be neccessary to perform a transformation of the scores matrix before utilizing the findCutoffs function. However a simple function such as f(score) = - score would perform the necessary transformation.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
# Adjusted Grading (without ties resolution):
 zipcode.cutoffs.df <- findCutoffs(X.kc, zips.kc, gamma = c(0, 30))
 mean.scores <- rowMeans(X.kc, na.rm = TRUE)
 adj.grades <- gradeAllBus(mean.scores, zips.kc, zipcode.cutoffs.df)

# Adjusted Grading (with ties resolution):
 cutoffs.Ties.df <- findCutoffs(X.kc, zips.kc, gamma = c(0, 30), resolve.ties = TRUE)
 grades.Ties <- gradeAllBus(mean.scores, zips.kc, cutoffs.Ties.df)

# Unadjusted Grading:
 unadj.cutoffs <- findCutoffs(X.kc, zips.kc, gamma = c(0, 30), type = "unadj")
 unadj.grades <- gradeAllBus(scores = X.kc[,c(1)], zips.kc, zip.cutoffs = unadj.cutoffs)

# Proportion A/B/C in each ZIP code
# Unadjusted
 foo1 <- round(t(table(unadj.grades, zips.kc))/apply(table(unadj.grades, zips.kc), 2, sum), 2)
# Adjusted (with ties resolution)
 foo2 <- round(t(table(adj.grades, zips.kc))/apply(table(adj.grades, zips.kc), 2, sum), 2)
# Adjusted (without ties resolution)
 foo3 <- round(t(table(grades.Ties, zips.kc))/apply(table(grades.Ties, zips.kc), 2, sum), 2)

# Correlation plots of unadjusted vs. adjusted (with resolution of ties) grade proportions
# in ZIP codes for different grades
# Proportions A
 plot(foo1[,1], foo2[,1], xlim=range(cbind(foo1[,1],foo2[,1])),
 ylim=range(cbind(foo2[,1],foo1[,1])), pch=16,
 cex=sqrt(apply(table(adj.grades, zips.kc), 2, sum)/pi)*0.3,
 main = "Proportion A in ZIP Codes",
 xlab = "Unadjusted", ylab = "Adjusted")
# Proportions B
 plot(foo1[,2], foo2[,2],xlim=range(cbind(foo1[,2],foo2[,2])),
 ylim=range(cbind(foo2[,2],foo1[,2])),pch=16,
 cex=sqrt(apply(table(adj.grades,zips.kc),2,sum)/pi)*0.3,
 main = "Proportion B in ZIP Codes", xlab = "Unadjusted", ylab = "Adjusted")
# Proportions C
 plot(foo1[,3], foo2[,3],xlim=range(cbind(foo1[,3],foo2[,3])),
 ylim=range(cbind(foo2[,3],foo1[,3])),pch=16,
 cex=sqrt(apply(table(adj.grades,zips.kc),2,sum)/pi)*0.3,
 main = "Proportion C in ZIP Codes", xlab = "Unadjusted", ylab = "Adjusted")

King-County-Restaurant-Grading/DineSafeR documentation built on May 8, 2019, 4:50 p.m.