R/ds.extractQuantiles.R

Defines functions ds.extractQuantiles

Documented in ds.extractQuantiles

# ds.extractQuantiles
#' @title Secure ranking of a vector across all sources and use of these ranks
#' to estimate global quantiles across all studies
#' @description Takes the global ranks and quantiles held in the serverside data
#' data frame that is written by ranksSecureDS4 and named as specified by the
#' argument (<output.ranks.df>) and converts these values into a series of
#' quantile values that identify, for example, which value of V2BR across all of
#' the studies corresponds to the median or to the 95% quantile. There is no
#' indication in which study the V2BR corresponding to a particular quantile
#' falls and, in fact, the relevant value may fall in more than one study and
#' may appear multiple times in any one study. Finally, the output data
#' frame containing this information is written to the clientside and to the
#' serverside at each study separately. 
#' @details ds.extractQuantiles is a clientside function which should usually
#' be called from within the clientside function ds.ranksSecure.If you try to
#' call ds.extractQuantiles directly(i.e. not by running ds.ranksSecure) you
#' are almost certainly going to have to set up quite a few vectors and scalars
#' that are normally set by ds.ranksSecure and this is likely to be difficult. 
#' ds.extractQuantiles itself calls two serverside functions extractQuantilesDS1
#' and extractQuantilesDS2. For more details about the cluster of functions that
#' collectively enable secure global ranking and estimation of global quantiles
#' see the associated document entitled "secure.global.ranking.docx". In
#' particular this explains how ds.extractQuantiles works. Also see the header
#' file for ds.ranksSecure.
#' @param extract.quantiles one of a restricted set of character strings.
#' The value of this argument is set in choosing the value of the argument
#' <quantiles.for.estimation> in ds.ranksSecure. In summary: to mitigate
#' disclosure risk only the following set of quantiles can be
#' generated: c(0.025,0.05,0.10,0.20,0.25,0.30,0.3333,0.40,0.50,0.60,0.6667,
#' 0.70,0.75,0.80,0.90,0.95,0.975). The allowable formats for the argument
#' are of the general form: "0.025-0.975" where the first number is the lowest
#' quantile to be estimated and the second number is the equivalent highest 
#' quantile to estimate. These two quantiles are then estimated along with
#' all allowable quantiles in between. The allowable argument values are then:
#' "0.025-0.975", "0.05-0.95", "0.10-0.90", "0.20-0.80". Two alternative values
#' are "quartiles" i.e. c(0.25,0.50,0.75), and "median" i.e. c(0.50). The
#' default value is "0.05-0.95". For more details, see the associated document
#' "secure.global.ranking.docx". Also see the header file for ds.ranksSecure.
#' @param extract.summary.output.ranks.df a character string which specifies 
#' the optional name for the summary data.frame written to the serverside on
#' each data source that contains 5 of the key output variables from the ranking
#' procedure pertaining to that particular data source. If no name has been
#' specified by the argument <summary.output.ranks.df> in ds.ranksSecure, the
#' default name is allocated as "summary.ranks.df".The only reason the
#' <extract.summary.output.ranks.df> argument needs specifying in
#' ds.extractQuantiles is because, ds.extractQuantiles is the last function
#' called by ds.ranksSecure and almost the final command of ds.extractQuantiles
#' to print out the name of the data frame containing the summarised ranking
#' information generated by ds.ranksSecure and the order in which the
#' data frame is laid out. This therefore appears as the last output produced 
#' when ds.ranksSecure is run, and when this happens it is clear this relates to
#' the main output of ds.ranksSecure not of ds.extractQuantiles.
#' @param extract.ranks.sort.by a character string taking two possible values.
#' These are "ID.orig" and "vals.orig". This is set via the argument
#' <ranks.sort.by> in ds.ranksSecure. For more details see the associated
#' document entitled "secure.global.ranking.docx". Also see the header
#' file for ds.ranksSecure.
#' @param extract.rm.residual.objects logical value. Default = TRUE: at the beginning
#' and end of each run of ds.ranksSecure delete all extraneous objects that are
#' otherwise left behind. These are not usually needed, but could be of value
#' if one were investigating a problem with the ranking. FALSE: do not delete
#' the residual objects
#' @param extract.datasources specifies the particular opal object(s) to use.
#' This is set via the argument<datasources> in ds.ranksSecure. For more details
#' see the associated document entitled "secure.global.ranking.docx". Also see
#' the header file for ds.ranksSecure.
#' @return the final main output of ds.extractQuantiles is a data frame object
#' named "final.quantile.df". This contains two vectors. The first named 
#' "evaluation.quantiles" lists the full set of quantiles you have requested
#' for evaluation as specified by the argument "quantiles.for.estimation" in
#' ds.ranksSecure and explained in more detail above under the information for
#' the argument "extract.quantiles" in this function. The second vector is
#' called "final.quantile.vector" which details the values of V2BR that
#' correspond to the evaluation quantiles in vector 1. The information in the
#' data frame "final.quantile.df" is generic: there is no information
#' identifying in which study each value of V2BR falls. This data frame is
#' written to the clientside (as it is non-disclosive) and is also copied to
#' the serverside in every study. This means it is easily accessible from
#' anywhere in the DataSHIELD environment. For more details
#' see the associated document entitled "secure.global.ranking.docx".
#' @author Paul Burton 11th November, 2021

  ds.extractQuantiles <- function(extract.quantiles,
                                  extract.summary.output.ranks.df,
                                  extract.ranks.sort.by,
                                  extract.rm.residual.objects,
                                  extract.datasources=NULL){
    
    # look for DS connections
    if(is.null(extract.datasources)){
      datasources <- datashield.connections_find()
    }

datasources.in.current.function<-datasources
    
    # ensure datasources is a list of DSConnection-class
    if(!(is.list(datasources) && all(unlist(lapply(datasources, function(d) {methods::is(d,"DSConnection")}))))){
      stop("The 'datasources' were expected to be a list of DSConnection-class objects", call.=FALSE)
    }
  
  numstudies<-length(datasources)

  
    #AFTER RUNNING ds.ranksSecure USE extractQuantilesDS1 (serverside aggregate)
  #FUNCTION TO CREATE STUDY SPECIFIC BOUNDS OF NEAREST
  #QUANTILES ABOVE AND BELOW EACH EVALUATION THRESHOLD AND TO WRITE
  #EVALUATON QUANTILES AND STUDY-SPECIFIC BOUNDS AROUND EACH VALUE AS A
  #NEW OBJECT. EVALUATION QUANTILES ARE SET ON SERVERSIDE AND CANNOT BE CHANGED
  #FROM CLIENTSIDE. FULL RANGE POSSIBLE IS:
  #  c(0.025,0.05,0.10,0.20,0.25,0.30,0.3333,0.40,0.50,
  #    0.60,0.6667,0.70,0.75,0.80,0.90,0.95,0.975)
  
  
  calltext8 <- call("extractQuantilesDS1",extract.quantiles,
                    extract.summary.output.ranks.df)
  closest.bounds.df<-DSI::datashield.aggregate(datasources,calltext8)

  
  
  
  lower.bound.matrix<-matrix(NA,nrow=nrow(closest.bounds.df[[1]]),ncol=numstudies)
  upper.bound.matrix<-matrix(NA,nrow=nrow(closest.bounds.df[[1]]),ncol=numstudies)
  
  for(ss in 1:numstudies)
  {
    lower.bound.matrix[,ss]<-closest.bounds.df[[ss]]$lower.bound
    upper.bound.matrix[,ss]<-closest.bounds.df[[ss]]$upper.bound
  }
  
  numvals<-length(lower.bound.matrix[,1])
  
  lower.bound.vector<-rep(NA,numvals)
  upper.bound.vector<-rep(NA,numvals)
  
  for(vv in 1:numvals){
    lower.bound.vector[vv]<-max(lower.bound.matrix[vv,])
    upper.bound.vector[vv]<-min(upper.bound.matrix[vv,])
  }
  
  evaluation.quantiles<-closest.bounds.df[[1]]$evaluation.quantiles

  
  global.bounds.df<-data.frame(cbind(evaluation.quantiles,lower.bound.vector,upper.bound.vector))
  
  colnames(global.bounds.df)[1]<-"evaluation.quantiles" 
  
  global.bounds.df
  
  #CALL CLIENTSIDE FUNCTION ds.dmtC2S TO RETURN global.bounds.df TO SERVERSIDE
  dsBaseClient::ds.dmtC2S(dfdata=global.bounds.df,newobj="global.bounds.df",datasources = datasources.in.current.function)
  
  calltext9 <- call("extractQuantilesDS2",extract.summary.output.ranks.df)
  R.global.bounds<-DSI::datashield.aggregate(datasources,calltext9)
  
  
  numvals<-length(R.global.bounds[[1]]$relevant.study.specific.input.values.lower)

  for(xx in 1:numvals){
    all.lower.values.na<-1
    all.upper.values.na<-1
      for(tt in 1:length(R.global.bounds)){
      if(!is.na(R.global.bounds[[tt]]$relevant.study.specific.input.values.lower[xx]))all.lower.values.na<-0
      if(!is.na(R.global.bounds[[tt]]$relevant.study.specific.input.values.upper[xx]))all.upper.values.na<-0
      }
    if(all.lower.values.na==1||all.upper.values.na==1){
      error.message<-
        paste0("FAILED: one of the extreme quantile estimates is NA probably because of a cluster of values at one end of the range of possible values. Try setting a narrower range of quantile values via the <quantiles.for.estimation> argument")
      stop(error.message, call. = FALSE)
    }
    
  }
      
 
  final.quantile.value.vector.lower<-rep(NA,numvals)
  final.quantile.value.vector.upper<-rep(NA,numvals)

 
  for(vv in 1:numvals){
    for(ss in 1:numstudies){
      if(!is.na(R.global.bounds[[ss]]$relevant.study.specific.input.values.lower[vv]) )
      {
        final.quantile.value.vector.lower[vv]<-R.global.bounds[[ss]]$relevant.study.specific.input.values.lower[vv]
      } 
      
      
      if(!is.na(R.global.bounds[[ss]]$relevant.study.specific.input.values.upper[vv]) )
      {
        final.quantile.value.vector.upper[vv]<-R.global.bounds[[ss]]$relevant.study.specific.input.values.upper[vv]
      } 
    }
  }

    final.quantile.vector<-rep(NA,numvals)
  
  for(ww in 1:numvals)
  {
    final.quantile.vector[ww]<-signif(mean(final.quantile.value.vector.lower[ww],final.quantile.value.vector.upper[ww]),6)
    
      }
 
   
  final.quantile.df<-data.frame(cbind(evaluation.quantiles,final.quantile.vector))
  
  final.quantile.df

  #CLEAN UP UNWANTED RESIDUAL OBJECTS FROM THE RUNNING OF ds.extractQuantiles
  
  if(extract.rm.residual.objects)
  {
    #UNLESS THE <rm.residual.objects> IS FALSE,
    #CLEAR UP ANY UNWANTED RESIDUAL OBJECTS
    
    rm.names.eQ<-c("global.bounds.df")
    
    #make transmittable via parser
    rm.names.eQ.transmit <- paste(rm.names.eQ,collapse=",")
    
    calltext.rm.eQ <- call("rmDS", rm.names.eQ.transmit)
    
    rm.output.eQ <- DSI::datashield.aggregate(datasources, calltext.rm.eQ)
    
  }
  
  #CALL CLIENTSIDE FUNCTION ds.dmtC2S TO RETURN final.quantile.df TO SERVERSIDE
  dsBaseClient::ds.dmtC2S(dfdata=final.quantile.df,newobj="final.quantile.df",datasources = datasources.in.current.function)

  cat("\n\n\n"," FINAL RANKING PROCEDURES COMPLETE:
  PRIMARY RANKING OUTPUT IS IN DATA FRAME",extract.summary.output.ranks.df,
"
  WHICH IS SORTED BY",extract.ranks.sort.by," AND HAS BEEN
  WRITTEN TO THE SERVERSIDE. VALUES OF V2BR CORRESPONDING TO
  KEY QUANTILES HAVE BEEN WRITTEN TO BOTH CLIENTSIDE AND
  SERVERSIDE AS final.quantile.df\n\n\n\n")
  

  return(final.quantile.df)
}

##########################################
#ds.extractQuantiles
datashield/dsBaseClient documentation built on May 16, 2023, 10:19 p.m.