# ds.extractQuantiles
#' @title Secure ranking of a vector across all sources and use of these ranks
#' to estimate global quantiles across all studies
#' @description Takes the global ranks and quantiles held in the serverside data
#' data frame that is written by ranksSecureDS4 and named as specified by the
#' argument (<output.ranks.df>) and converts these values into a series of
#' quantile values that identify, for example, which value of V2BR across all of
#' the studies corresponds to the median or to the 95% quantile. There is no
#' indication in which study the V2BR corresponding to a particular quantile
#' falls and, in fact, the relevant value may fall in more than one study and
#' may appear multiple times in any one study. Finally, the output data
#' frame containing this information is written to the clientside and to the
#' serverside at each study separately.
#' @details ds.extractQuantiles is a clientside function which should usually
#' be called from within the clientside function ds.ranksSecure.If you try to
#' call ds.extractQuantiles directly(i.e. not by running ds.ranksSecure) you
#' are almost certainly going to have to set up quite a few vectors and scalars
#' that are normally set by ds.ranksSecure and this is likely to be difficult.
#' ds.extractQuantiles itself calls two serverside functions extractQuantilesDS1
#' and extractQuantilesDS2. For more details about the cluster of functions that
#' collectively enable secure global ranking and estimation of global quantiles
#' see the associated document entitled "secure.global.ranking.docx". In
#' particular this explains how ds.extractQuantiles works. Also see the header
#' file for ds.ranksSecure.
#' @param extract.quantiles one of a restricted set of character strings.
#' The value of this argument is set in choosing the value of the argument
#' <quantiles.for.estimation> in ds.ranksSecure. In summary: to mitigate
#' disclosure risk only the following set of quantiles can be
#' generated: c(0.025,0.05,0.10,0.20,0.25,0.30,0.3333,0.40,0.50,0.60,0.6667,
#' 0.70,0.75,0.80,0.90,0.95,0.975). The allowable formats for the argument
#' are of the general form: "0.025-0.975" where the first number is the lowest
#' quantile to be estimated and the second number is the equivalent highest
#' quantile to estimate. These two quantiles are then estimated along with
#' all allowable quantiles in between. The allowable argument values are then:
#' "0.025-0.975", "0.05-0.95", "0.10-0.90", "0.20-0.80". Two alternative values
#' are "quartiles" i.e. c(0.25,0.50,0.75), and "median" i.e. c(0.50). The
#' default value is "0.05-0.95". For more details, see the associated document
#' "secure.global.ranking.docx". Also see the header file for ds.ranksSecure.
#' @param extract.summary.output.ranks.df a character string which specifies
#' the optional name for the summary data.frame written to the serverside on
#' each data source that contains 5 of the key output variables from the ranking
#' procedure pertaining to that particular data source. If no name has been
#' specified by the argument <summary.output.ranks.df> in ds.ranksSecure, the
#' default name is allocated as "summary.ranks.df".The only reason the
#' <extract.summary.output.ranks.df> argument needs specifying in
#' ds.extractQuantiles is because, ds.extractQuantiles is the last function
#' called by ds.ranksSecure and almost the final command of ds.extractQuantiles
#' to print out the name of the data frame containing the summarised ranking
#' information generated by ds.ranksSecure and the order in which the
#' data frame is laid out. This therefore appears as the last output produced
#' when ds.ranksSecure is run, and when this happens it is clear this relates to
#' the main output of ds.ranksSecure not of ds.extractQuantiles.
#' @param extract.ranks.sort.by a character string taking two possible values.
#' These are "ID.orig" and "vals.orig". This is set via the argument
#' <ranks.sort.by> in ds.ranksSecure. For more details see the associated
#' document entitled "secure.global.ranking.docx". Also see the header
#' file for ds.ranksSecure.
#' @param extract.rm.residual.objects logical value. Default = TRUE: at the beginning
#' and end of each run of ds.ranksSecure delete all extraneous objects that are
#' otherwise left behind. These are not usually needed, but could be of value
#' if one were investigating a problem with the ranking. FALSE: do not delete
#' the residual objects
#' @param extract.datasources specifies the particular opal object(s) to use.
#' This is set via the argument<datasources> in ds.ranksSecure. For more details
#' see the associated document entitled "secure.global.ranking.docx". Also see
#' the header file for ds.ranksSecure.
#' @return the final main output of ds.extractQuantiles is a data frame object
#' named "final.quantile.df". This contains two vectors. The first named
#' "evaluation.quantiles" lists the full set of quantiles you have requested
#' for evaluation as specified by the argument "quantiles.for.estimation" in
#' ds.ranksSecure and explained in more detail above under the information for
#' the argument "extract.quantiles" in this function. The second vector is
#' called "final.quantile.vector" which details the values of V2BR that
#' correspond to the evaluation quantiles in vector 1. The information in the
#' data frame "final.quantile.df" is generic: there is no information
#' identifying in which study each value of V2BR falls. This data frame is
#' written to the clientside (as it is non-disclosive) and is also copied to
#' the serverside in every study. This means it is easily accessible from
#' anywhere in the DataSHIELD environment. For more details
#' see the associated document entitled "secure.global.ranking.docx".
#' @author Paul Burton 11th November, 2021
ds.extractQuantiles <- function(extract.quantiles,
extract.summary.output.ranks.df,
extract.ranks.sort.by,
extract.rm.residual.objects,
extract.datasources=NULL){
# look for DS connections
if(is.null(extract.datasources)){
datasources <- datashield.connections_find()
}
datasources.in.current.function<-datasources
# ensure datasources is a list of DSConnection-class
if(!(is.list(datasources) && all(unlist(lapply(datasources, function(d) {methods::is(d,"DSConnection")}))))){
stop("The 'datasources' were expected to be a list of DSConnection-class objects", call.=FALSE)
}
numstudies<-length(datasources)
#AFTER RUNNING ds.ranksSecure USE extractQuantilesDS1 (serverside aggregate)
#FUNCTION TO CREATE STUDY SPECIFIC BOUNDS OF NEAREST
#QUANTILES ABOVE AND BELOW EACH EVALUATION THRESHOLD AND TO WRITE
#EVALUATON QUANTILES AND STUDY-SPECIFIC BOUNDS AROUND EACH VALUE AS A
#NEW OBJECT. EVALUATION QUANTILES ARE SET ON SERVERSIDE AND CANNOT BE CHANGED
#FROM CLIENTSIDE. FULL RANGE POSSIBLE IS:
# c(0.025,0.05,0.10,0.20,0.25,0.30,0.3333,0.40,0.50,
# 0.60,0.6667,0.70,0.75,0.80,0.90,0.95,0.975)
calltext8 <- call("extractQuantilesDS1",extract.quantiles,
extract.summary.output.ranks.df)
closest.bounds.df<-DSI::datashield.aggregate(datasources,calltext8)
lower.bound.matrix<-matrix(NA,nrow=nrow(closest.bounds.df[[1]]),ncol=numstudies)
upper.bound.matrix<-matrix(NA,nrow=nrow(closest.bounds.df[[1]]),ncol=numstudies)
for(ss in 1:numstudies)
{
lower.bound.matrix[,ss]<-closest.bounds.df[[ss]]$lower.bound
upper.bound.matrix[,ss]<-closest.bounds.df[[ss]]$upper.bound
}
numvals<-length(lower.bound.matrix[,1])
lower.bound.vector<-rep(NA,numvals)
upper.bound.vector<-rep(NA,numvals)
for(vv in 1:numvals){
lower.bound.vector[vv]<-max(lower.bound.matrix[vv,])
upper.bound.vector[vv]<-min(upper.bound.matrix[vv,])
}
evaluation.quantiles<-closest.bounds.df[[1]]$evaluation.quantiles
global.bounds.df<-data.frame(cbind(evaluation.quantiles,lower.bound.vector,upper.bound.vector))
colnames(global.bounds.df)[1]<-"evaluation.quantiles"
global.bounds.df
#CALL CLIENTSIDE FUNCTION ds.dmtC2S TO RETURN global.bounds.df TO SERVERSIDE
dsBaseClient::ds.dmtC2S(dfdata=global.bounds.df,newobj="global.bounds.df",datasources = datasources.in.current.function)
calltext9 <- call("extractQuantilesDS2",extract.summary.output.ranks.df)
R.global.bounds<-DSI::datashield.aggregate(datasources,calltext9)
numvals<-length(R.global.bounds[[1]]$relevant.study.specific.input.values.lower)
for(xx in 1:numvals){
all.lower.values.na<-1
all.upper.values.na<-1
for(tt in 1:length(R.global.bounds)){
if(!is.na(R.global.bounds[[tt]]$relevant.study.specific.input.values.lower[xx]))all.lower.values.na<-0
if(!is.na(R.global.bounds[[tt]]$relevant.study.specific.input.values.upper[xx]))all.upper.values.na<-0
}
if(all.lower.values.na==1||all.upper.values.na==1){
error.message<-
paste0("FAILED: one of the extreme quantile estimates is NA probably because of a cluster of values at one end of the range of possible values. Try setting a narrower range of quantile values via the <quantiles.for.estimation> argument")
stop(error.message, call. = FALSE)
}
}
final.quantile.value.vector.lower<-rep(NA,numvals)
final.quantile.value.vector.upper<-rep(NA,numvals)
for(vv in 1:numvals){
for(ss in 1:numstudies){
if(!is.na(R.global.bounds[[ss]]$relevant.study.specific.input.values.lower[vv]) )
{
final.quantile.value.vector.lower[vv]<-R.global.bounds[[ss]]$relevant.study.specific.input.values.lower[vv]
}
if(!is.na(R.global.bounds[[ss]]$relevant.study.specific.input.values.upper[vv]) )
{
final.quantile.value.vector.upper[vv]<-R.global.bounds[[ss]]$relevant.study.specific.input.values.upper[vv]
}
}
}
final.quantile.vector<-rep(NA,numvals)
for(ww in 1:numvals)
{
final.quantile.vector[ww]<-signif(mean(final.quantile.value.vector.lower[ww],final.quantile.value.vector.upper[ww]),6)
}
final.quantile.df<-data.frame(cbind(evaluation.quantiles,final.quantile.vector))
final.quantile.df
#CLEAN UP UNWANTED RESIDUAL OBJECTS FROM THE RUNNING OF ds.extractQuantiles
if(extract.rm.residual.objects)
{
#UNLESS THE <rm.residual.objects> IS FALSE,
#CLEAR UP ANY UNWANTED RESIDUAL OBJECTS
rm.names.eQ<-c("global.bounds.df")
#make transmittable via parser
rm.names.eQ.transmit <- paste(rm.names.eQ,collapse=",")
calltext.rm.eQ <- call("rmDS", rm.names.eQ.transmit)
rm.output.eQ <- DSI::datashield.aggregate(datasources, calltext.rm.eQ)
}
#CALL CLIENTSIDE FUNCTION ds.dmtC2S TO RETURN final.quantile.df TO SERVERSIDE
dsBaseClient::ds.dmtC2S(dfdata=final.quantile.df,newobj="final.quantile.df",datasources = datasources.in.current.function)
cat("\n\n\n"," FINAL RANKING PROCEDURES COMPLETE:
PRIMARY RANKING OUTPUT IS IN DATA FRAME",extract.summary.output.ranks.df,
"
WHICH IS SORTED BY",extract.ranks.sort.by," AND HAS BEEN
WRITTEN TO THE SERVERSIDE. VALUES OF V2BR CORRESPONDING TO
KEY QUANTILES HAVE BEEN WRITTEN TO BOTH CLIENTSIDE AND
SERVERSIDE AS final.quantile.df\n\n\n\n")
return(final.quantile.df)
}
##########################################
#ds.extractQuantiles
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.