The main function for TopKInference

Share:

Description

Returns a complex object named truncated.lists containing the Idata vector (see prepare.idata), the estimated truncation index j_0=k+1 (see compute.stream) for each pair of input lists, the overall top-k estimate (see j0.multi), and other objects with necessary plotting information for the aggmap

Usage

1
calculate.maxK(lists, L, d, v, threshold) 

Arguments

lists

Data frame containing two or more columns that represent input lists of ordered objects subject to comparison

L

Number of input lists that are compared

d

The maximal distance delta between object ranks required for the estimation of j_0

v

The pilot sample size (tuning parameter) ν required for the estimation of j_0

threshold

The percentage of occurencies of an object in the top-k selection among all comparisons in order to be gray-shaded in the aggmap as a consolidated object

Value

A named list of the following content:

comparedLists

Contains information about the overlap of all pairwise compared lists (structure for the aggmap)

info

Contains information about the list names

grayshadedLists

Contains information which objects in a list are consolidated (gray-shaded in the aggmap)

summarytable

Table of top-k list overlaps containing rank information, the rank sum, the order of objects as a function of the rank sum, the frequency of an object in the input lists and the frequency of an object in the truncated lists (for plotting in the aggmap)

vennlists

Contains the top-k objects for each of the input lists (for display in the Venn-diagram)

venntable

Contains the overlap information (for display in the Venn-table)

v

Selected pilot sample size (tuning parameter) ν

Ntoplot

Number of columns to be plotted in the aggmap

Idata

Data frame of Idata vectors (see compute.stream) for each pair of input lists and the associated delta's

d

selected delta

threshold

selected threshold

threshold

number of lists

N

number of items in data frame (lists)

lists

data frame of lists that entered the analysis

maxK

maximal estimate of the top-k's (for all pairwise comparisons)

topkspace

the final integrated list of objects as result of the CEMC algorithm applied to the maxK truncated lists

Author(s)

Eva Budinska <budinska@iba.muni.cz>, Michael G. Schimek <michael.schimek@medunigraz.at>

References

Hall, P. and Schimek, M. G. (2012). Moderate deviation-based inference for random degeneration in paired rank lists. J. Amer. Statist. Assoc., 107, 661-672.

See Also

CEMC, prepare.idata

Examples

1
2
3
4
5
6
7
set.seed(1234)
data(breast)
truncated.lists = calculate.maxK(breast, d=6, v=10, L=3, threshold=50)
## Not run: 
aggmap(truncated.lists)

## End(Not run)