CEMC based rank aggregation

Share:

Description

Performs Cross Entropy Monte Carlo simulations for generating combined ranked list using CEMC, taking into account the different spaces of ranked input lists.

Usage

1
2
3
CEMC(input, space = NULL, k=NULL, dm = "k", kp = 0.5, N = NULL, N1 = NULL,
rho = 0.1, e1 = 0.1, e2 = 1, w = 0.5, b = 0, init.m = "p", init.w = 0,
d.w = NULL, input.par = NULL, extra=0)

Arguments

input

A list of several TopKLists, may have different length

space

A list of the same structure as the input list. Contains underlying spaces for the top-k lists. NULL means all lists share a common space, which is taken to be the union of all input lists

k

Desired length of combined list

dm

Distance measure, "s" for Spearman, "k" for Kendall (p=0)

kp

Partial distance used in Kendall's tau distance measure

N

Number of samples generated in each iterate

N1

Number of samples retained after each iterate

rho

Proportion of samples used to estimate a new probability matrix

e1

Stopping criterion with respect to the l1-norm of the difference of the two probability matrices between the current and previous iterations

e2

Stopping criterion with respect to the difference in the obtimizing criterion (e.g. the generalized Kemeny guideline) between the current and the previous iterations

w

Weight of the new probability vector for the next iterate

b

Parameter used in blur function - this is for finding starting values for the algorithem

init.m

Initialization method, see the function init.p for details

init.w

Probability matrix initialization. (See Details)

d.w

Weights for distances from different input lists

input.par

Input parameters in a data.frame

extra

Number of additional items to be included in the combined ranked list during the calculation

Details

The algorithm implemented is the Order Explicit Algorithm, which is an iterative procedure to maximize an objective function (either based on Kendall's distance (dm="k") or Spearman's distance (dm="s")).

init.w: probability matrix initialization: (1-init.w) * uniform + init.w * estimated from input lists

Value

A list containing three components:

TopK

A vector giving the aggregate ranked list.

ProbMatrix

A matrix, with each column represent the probability vector of a multinomial distribution and thus sum to 1.

input.par

A vector containing tuning parameters used in the current run. User may edit this vector and use it as input for a more refined analysis.

Author(s)

Jie Ding <jding@jimmy.harvard.edu>, Shili Lin <shili@stat.osu.edu>

References

Lin, S. and Ding, J. (2009). Integration of ranked lists via Cross Entropy Monte Carlo with applications to mRNA and microRNA studies. Biometrics, 65, 9-18.

Examples

1
2
3
4
5
6
7
8
#small data set; a larger data example is available in the vignettes
L1=c("chicken","dog","cat")
L2=c(1,"chicken","cat", 2:5)
L3=c("dog","chicken",1:10)
input=list(L1,L2,L3)
space1=c("chicken","dog","cat",1:10)
space=list(space1,space1,space1)
outCEMC=CEMC(input, space) #underlying space-dependent