ds.histogram: Generates a histogram plot

View source: R/ds.histogram.R

ds.histogramR Documentation

Generates a histogram plot

Description

ds.histogram function plots a non-disclosive histogram in the client-side.

Usage

ds.histogram(
  x = NULL,
  type = "split",
  num.breaks = 10,
  method = "smallCellsRule",
  k = 3,
  noise = 0.25,
  vertical.axis = "Frequency",
  datasources = NULL
)

Arguments

x

a character string specifying the name of a numerical vector.

type

a character string that represents the type of graph to display. The type argument can be set as 'combine' or 'split'. Default 'split'. For more information see Details.

num.breaks

a numeric specifying the number of breaks of the histogram. Default value is 10.

method

a character string that defines which histogram will be created. The method argument can be set as 'smallCellsRule', 'deterministic' or 'probabilistic'. Default 'smallCellsRule'. For more information see Details.

k

the number of the nearest neighbours for which their centroid is calculated. Default k value is 3. For more information see Details.

noise

the percentage of the initial variance that is used as the variance of the embedded noise if the argument method is set to 'probabilistic'. Default noise value is 0.25. For more information see Details.

vertical.axis

a character string that defines what is shown in the vertical axis of the plot. The vertical.axis argument can be set as 'Frequency' or 'Density'. Default 'Frequency'. For more information see Details.

datasources

a list of DSConnection-class objects obtained after login. If the datasources argument is not specified the default set of connections will be used: see datashield.connections_default.

Details

ds.histogram function allows the user to plot distinct histograms (one for each study) or a combined histogram that merges the single plots.

In the argument type can be specified two types of graphics to display:

'combine'

: a histogram that merges the single plot is displayed.

'split'

: each histogram is plotted separately.

In the argument method can be specified 3 different histograms to be created:

'smallCellsRule'

: the histogram of the actual variable is created but bins with low counts are removed.

'deterministic'

: the histogram of the scaled centroids of each k nearest neighbours of the original variable where the value of k is set by the user.

'probabilistic'

: the histogram shows the original distribution disturbed by the addition of random stochastic noise. The added noise follows a normal distribution with zero mean and variance equal to a percentage of the initial variance of the input variable. This percentage is specified by the user in the argument noise.

In the k argument the user can choose any value for k equal to or greater than the pre-specified threshold used as a disclosure control for this method and lower than the number of observations minus the value of this threshold. By default the value of k is set to be equal to 3 (we suggest k to be equal to, or bigger than, 3). Note that the function fails if the user uses the default value but the study has set a bigger threshold. The value of k is used only if the argument method is set to 'deterministic'. Any value of k is ignored if the argument method is set to 'probabilistic' or 'smallCellsRule'.

In the noise argument the percentage of the initial variance that is used as the variance of the embedded noise if the argument method is set to 'probabilistic'. Any value of noise is ignored if the argument method is set to 'deterministic' or 'smallCellsRule'. The user can choose any value for noise equal to or greater than the pre-specified threshold 'nfilter.noise'. By default the value of noise is set to be equal to 0.25.

In the argument vertical.axis can be specified two types of histograms:

'Frequency'

: the histogram of the frequencies is returned.

'Density'

: the histogram of the densities is returned.

Server function called: histogramDS2

Value

one or more histogram objects and plots depending on the argument type

Author(s)

DataSHIELD Development Team

Examples

## Not run: 

## Version 6, for version 5 see the Wiki
  # Connecting to the Opal servers

  require('DSI')
  require('DSOpal')
  require('dsBaseClient')

  builder <- DSI::newDSLoginBuilder()
  builder$append(server = "study1", 
                 url = "http://192.168.56.100:8080/", 
                 user = "administrator", password = "datashield_test&", 
                 table = "CNSIM.CNSIM1", driver = "OpalDriver")
  builder$append(server = "study2", 
                 url = "http://192.168.56.100:8080/", 
                 user = "administrator", password = "datashield_test&", 
                 table = "CNSIM.CNSIM2", driver = "OpalDriver")
  builder$append(server = "study3",
                 url = "http://192.168.56.100:8080/", 
                 user = "administrator", password = "datashield_test&", 
                 table = "CNSIM.CNSIM3", driver = "OpalDriver")
  logindata <- builder$build()
  
  # Log onto the remote Opal training servers
  connections <- DSI::datashield.login(logins = logindata, assign = TRUE, symbol = "D") 
  
  # Compute the histogram
  # Example 1: generate a histogram for each study separately 
  ds.histogram(x = 'D$PM_BMI_CONTINUOUS',
              type = "split",
              datasources = connections) #all studies are used

  # Example 2: generate a combined histogram with the default small cells counts
               suppression rule
  ds.histogram(x = 'D$PM_BMI_CONTINUOUS',
               method = 'smallCellsRule',
               type = 'combine',
               datasources = connections[1]) #only the first study is used (study1)

  # Example 3: if a variable is of type factor the function returns an error
  ds.histogram(x = 'D$PM_BMI_CATEGORICAL',
               datasources = connections)

  # Example 4: generate a combined histogram with the deterministic method for k=50
  ds.histogram(x = 'D$PM_BMI_CONTINUOUS',
               k = 50, 
               method = 'deterministic',
               type = 'combine',
               datasources = connections[2])#only the second study is used (study2)


  # Example 5: create a histogram and the probability density on the plot
  hist <- ds.histogram(x = 'D$PM_BMI_CONTINUOUS',
                       method = 'probabilistic', type='combine',
                       num.breaks = 30, 
                       vertical.axis = 'Density',
                       datasources = connections)
  lines(hist$mids, hist$density)

  # clear the Datashield R sessions and logout
  datashield.logout(connections)
  
## End(Not run)



datashield/dsBaseClient documentation built on Nov. 16, 2024, 2:07 p.m.