README.md
In julian-urbano/simIReff: Stochastic Simulation for Information Retrieval Evaluation: Effectiveness Scores

simIReff

Provides tools for the stochastic simulation of effectiveness scores to mitigate data-related limitations of Information Retrieval evaluation research. These tools include:

Fitting of continuous and discrete distributions to model system effectiveness.
Plotting of effectiveness distributions.
Selection of distributions best fitting to given data.
Transformation of distributions towards a prespecified expected value.
Proxy to fitting of copula models based on these distributions.
Simulation of new evaluation data from these distributions and copula models.

For reference please refer to Julián Urbano and Thomas Nagler, "Stochastic Simulation of Test Collections: Evaluation Scores", ACM SIGIR, 2018.

You may install the stable release from CRAN

install.packages("simIReff")

or the latest development version from GitHub

devtools::install_github("julian-urbano/simIReff", ref = "develop")

Fit a marginal AP distribution and simulate new data

x <- web2010ap[,10] # sample AP scores of a system
e <- effContFitAndSelect(x, method = "BIC") # fit and select based on BIC
plot(e) # plot pdf, cdf and quantile function
e$mean # expected value
y <- reff(50, e) # simulation of 50 new topics

and transform the distribution to have a pre-specified expected value.

e2 <- effTransform(e, mean = .14) # transform for expected value of .14
plot(e2)
e2$mean # check the result

Build a copula model of two systems

d <- web2010ap[,2:3] # sample AP scores
e1 <- effCont_norm(d[,1]) # force the first margin to follow a truncated gaussian
e2 <- effCont_bks(d[,2]) # force the second margin to follow a beta kernel-smoothed
cop <- effcopFit(d, list(e1, e2)) # copula
y <- reffcop(1000, cop) # simulation of 1000 new topics
c(e1$mean, e2$mean) # expected means
colMeans(y) # observed means

and modify the model so both systems have the same distribution

cop2 <- cop # copy the model
cop2$margins[[2]] <- e1 # modify 2nd margin
y <- reffcop(1000, cop2) # simulation of 1000 new topics
colMeans(y) # observed means

Automatically build a gaussian copula to many systems,

d <- web2010p20[,1:20] # sample P@20 data from 20 systems
effs <- effDiscFitAndSelect(d, support("p20")) # fit and select margins
cop <- effcopFit(d, effs, family_set = "gaussian") # fit copula
y <- reffcop(1000, cop) # simulate new 1000 topics

compare observed vs. expected mean,

E <- sapply(effs, function(e) e$mean)
E.hat <- colMeans(y)
plot(E, E.hat)
abline(0:1)

compare observed vs. expected variance,

Var <- sapply(effs, function(e) e$var)
Var.hat <- apply(y, 2, var)
plot(Var, Var.hat)
abline(0:1)

and compare original vs. simulated distributions.

o <- order(colMeans(d))
boxplot(d[,o])
points(colMeans(d)[o], col = "red", pch = 4) # plot means
boxplot(y[,o])
points(colMeans(y)[o], col = "red", pch = 4) # plot means

simIReff is released under the terms of the MIT License.

When using this archive, please cite the above paper:

@inproceedings{urbano2018simulation,
  author = {Urbano, Juli\'{a}n and Nagler, Thomas},
  booktitle = {International ACM SIGIR Conference on Research and Development in Information Retrieval},
  title = {{Stochastic Simulation of Test Collections: Evaluation Scores}},
  pages = {695--704},
  year = {2018}
}

julian-urbano/simIReff documentation built on May 21, 2019, 9:37 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

julian-urbano/simIReff
Stochastic Simulation for Information Retrieval Evaluation: Effectiveness Scores

README.md
In julian-urbano/simIReff: Stochastic Simulation for Information Retrieval Evaluation: Effectiveness Scores

simIReff

Installation

Usage

License

R Package Documentation

Browse R Packages

We want your feedback!

julian-urbano/simIReff Stochastic Simulation for Information Retrieval Evaluation: Effectiveness Scores

README.md In julian-urbano/simIReff: Stochastic Simulation for Information Retrieval Evaluation: Effectiveness Scores

simIReff

Installation

Usage

License

R Package Documentation

Browse R Packages

We want your feedback!

julian-urbano/simIReff
Stochastic Simulation for Information Retrieval Evaluation: Effectiveness Scores

README.md
In julian-urbano/simIReff: Stochastic Simulation for Information Retrieval Evaluation: Effectiveness Scores