Upper bound on the number of GARP-consistent subpopulations and clustering.

Share:

Description

The cpUpper function computes a Crawford-Pendakur type upper bound on the number of GARP-consistent subpopulations, and performs clustering of the data.

Usage

1
2
3
4
5
cpUpper(x, p, times= 1, afriat.par= 1, method= c("fastfloyd", "deep", "floyd"))
## S3 method for class 'upperBound'
print(x, ...)
## S3 method for class 'upperBound'
summary(object, ...)

Arguments

x

data frame or matrix containing the observed quantities, where each row corresponds to an observation and the columns are types of goods, or an object of class upperBound to be used with print,

p

data frame or matrix (of same dimensions as x) containing the corresponding prices,

times

number of times the algorithm is run (the final result is the best of times results, ie. lowest number of clusters found), default is 1,

afriat.par

the Afriat parameter, a real number in [0,1], which allows a certain level of error in the optimization of choices ; default is 1, ie. no optimization error allowed,

method

character string: "fastfloyd" or "deep" or "floyd" (see Details). Default "fastfloyd".

object

an object of class upperBound as returned by cpUpper,

...

additional arguments passed to the print and summary methods (unused).

Details

For each run of the algorithm, a random permutation of the observations is drawn, and one by one each observation is associated with the biggest cluster that can include it while preserving consistency with the GARP rationality axiom. If no cluster is compatible with a given observation a new cluster is created to accomodate it.

Three methods are available: "fastfloyd" (default) uses an iterative variant of the Floyd-Warshall algorithm, in which the check of consistency of the current observation with a given cluster is done in a single step of the Floyd-Warshall algorithm. Much faster than "floyd".

"deep" uses a single run of recursive depth-first search with tabu list for each check of an observation against a given cluster. Faster than "fastfloyd" on large datasets (eg. > 5000 observations).

"floyd" uses the algorithm described in Crawford and Pendakur (2013): at each step the complete Floyd-Warshall algorithm is run to check whether each cluster can accomodate the current observation. Much slower than the two other algorithms.

Value

cpUpper returns an object of class upperBound which contains the following elements:

clustering

numeric vector with length equal to number of observations, containing the cluster number of each observation,

cluster.pop

numeric vector containg the numbers of observations allocated to each cluster,

hist.n.types

numeric vector containing the history of numbers of clusters found during multiple runs of the algorithm.

n.types

upper bound on the number of types,

afriat.par

Afriat parameter used in the algorithm.

Note

Warning: cpUpper can be very slow for large datasets (eg. more than 1000 observations).

Author(s)

Julien Boelaert jubo.stats@gmail.com

References

Crawford, I. and Pendakur, K. (2013). How many types are there? The Economic Journal, 123(567):77-95.

See Also

See cpLower for the lower bound on number of types.

Examples

1
2
3
4
# Cluster GARP-violating data:
data(noGarp)
cp.up <- cpUpper(noGarp$x, noGarp$p)
cp.up$clustering