options(width = 80) knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.path = "Readme_files/" ) pkgs = c("here", "opalr", "DSI", "DSOpal", "dsBaseClient") for (pkg in pkgs) { if (! requireNamespace(pkg, quietly = TRUE)) install.packages(pkg, repos = c(getOption("repos"), "https://cran.obiba.org")) } devtools::install(quiet = TRUE, upgrade = "always") ## Install packages on the DataSHIELD test machine: surl = "https://opal-demo.obiba.org/" username = "administrator" password = "password" opal = opalr::opal.login(username = username, password = password, url = surl) pkgs = c("dsPredictBase", "dsROCGLM") for (pkg in pkgs) { check1 = opalr::dsadmin.install_github_package(opal = opal, pkg = pkg, username = "difuture-lmu") if (! check1) stop("[", Sys.time(), "] Was not able to install ", pkg, "!") check2 = opalr::dsadmin.publish_package(opal = opal, pkg = pkg) if (! check2) stop("[", Sys.time(), "] Was not able to publish methods of ", pkg, "!") }
The package provides functionality to conduct and visualize ROC analysis on decentralized data. The basis is the DataSHIELD](https://www.datashield.org/) infrastructure for distributed computing. This package provides the calculation of the ROC-GLM as well as AUC confidence intervals. In order to calculate the ROC-GLM it is necessry to push models and predict them at the servers. This is done automatically by the base package dsPredictBase
. Note that DataSHIELD uses an option datashield.privacyLevel
to indicate the minimal amount of numbers required to be allowed to share an aggregated value of these numbers. Instead of setting the option, we directly retrieve the privacy level from the DESCRIPTION
file each time a function calls for it. This options is set to 5 by default.
At the moment, there is no CRAN version available. Install the development version from GitHub:
remotes::install_github("difuture-lmu/dsROCGLM")
It is necessary to register the assign and aggregate methods in the OPAL administration. These methods are registered automatically when publishing the package on OPAL (see DESCRIPTION
).
Note that the package needs to be installed at both locations, the server and the analysts machine.
A more sophisticated example is available here.
library(DSI) library(DSOpal) library(dsBaseClient) library(dsPredictBase) library(dsROCGLM)
builder = newDSLoginBuilder() surl = "https://opal-demo.obiba.org/" username = "administrator" password = "password" builder$append( server = "ds1", url = surl, user = username, password = password ) builder$append( server = "ds2", url = surl, user = username, password = password ) connections = datashield.login(logins = builder$build(), assign = TRUE)
datashield.assign(connections, "iris", quote(iris)) datashield.assign(connections, "y", quote(c(rep(1, 50), rep(0, 100))))
# Model predicts if species of iris is setosa or not. iris$y = ifelse(iris$Species == "setosa", 1, 0) mod = glm(y ~ Sepal.Length, data = iris, family = binomial()) # Push the model to the DataSHIELD servers using `dsPredictBase`: pushObject(connections, mod) # Calculate scores and save at the servers using `dsPredictBase`: pfun = "predict(mod, newdata = D, type = 'response')" predictModel(connections, mod, "pred", "iris", predict_fun = pfun) datashield.symbols(connections)
# In order to securely calculate the ROC-GLM, we have to assess the # l2-sensitivity to set the privacy parameters of differential # privacy adequately: l2s = dsL2Sens(connections, "iris", "pred") l2s # Due to the results presented in https://arxiv.org/abs/2203.10828, we set the privacy parameters to # - epsilon = 0.2, delta = 0.1 if i l2s <= 0.01 # - epsilon = 0.3, delta = 0.4 if 0.01 < l2s <= 0.03 # - epsilon = 0.5, delta = 0.3 if 0.03 < l2s <= 0.05 # - epsilon = 0.5, delta = 0.5 if 0.05 < l2s <= 0.07 # - epsilon = 0.5, delta = 0.5 if 0.07 < l2s BUT results may be not good!
roc_glm = dsROCGLM(connections, truth_name = "y", pred_name = "pred", dat_name = "iris", seed_object = "y") roc_glm plot(roc_glm)
Build by r Sys.info()[["login"]]
(r Sys.info()[["sysname"]]
) on r as.character(Sys.time())
.
This readme is built automatically after each push to the repository. Hence, it also is a test if the functionality of the package works also on the DataSHIELD servers. We also test these functionality in tests/testthat/test_on_active_server.R
. The system information of the local and remote servers are as followed:
ri_l = sessionInfo() ri_ds = datashield.aggregate(connections, quote(getDataSHIELDInfo())) client_pkgs = c("DSI", "DSOpal", "dsBaseClient", "dsPredictBase", "dsROCGLM") remote_pkgs = c("dsBase", "resourcer", "dsPredictBase", "dsROCGLM")
R
version: r ri_l$R.version$version.string
dfv = installed.packages()[client_pkgs, ] dfv = data.frame(Package = rownames(dfv), Version = unname(dfv[, "Version"])) knitr::kable(dfv)
R
version of r names(ri_ds)[1]
: r ri_ds[[1]]$session$R.version$version.string
R
version of r names(ri_ds)[2]
: r ri_ds[[2]]$session$R.version$version.string
dfv = do.call(cbind, lapply(names(ri_ds), function(nm) { out = ri_ds[[nm]]$pcks[remote_pkgs, "Version", drop = FALSE] colnames(out) = paste0(nm, ": ", colnames(out)) as.data.frame(out) })) dfv = cbind(Package = rownames(dfv), dfv) rownames(dfv) = NULL knitr::kable(dfv)
datashield.logout(connections)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.