capacity_logreg_testing: Testing procedures for estimation of channel capacity

View source: R/capacity_logreg_testing.R

capacity_logreg_testingR Documentation

Testing procedures for estimation of channel capacity

Description

Diagnostic procedures that allows to compute the uncertainty of estimation of channel capacity by SLEMI approach. Two main procedures are implemented: bootstrap, which execute estimation with using a fraction of data and overfitting test, which divides data into two parts: training and testing. Each of them is repeated specified number of times to obtain a distribution of our estimators. It is recommended to conduct estimation by calling capacity_logreg_main.R.

Usage

capacity_logreg_testing(
  data,
  signal = "signal",
  response = "response",
  side_variables = NULL,
  cc_maxit = 100,
  lr_maxit = 1000,
  MaxNWts = 5000,
  formula_string = NULL,
  TestingSeed = 1234,
  testing_cores = 1,
  boot_num = 10,
  boot_prob = 0.8,
  sidevar_num = 10,
  traintest_num = 10,
  partition_trainfrac = 0.6
)

Arguments

data

must be a data.frame object. Cannot contain NA values.

signal

is a character object with names of columns of dataRaw to be treated as channel's input.

response

is a character vector with names of columns of dataRaw to be treated as channel's output

side_variables

(optional) is a character vector that indicates side variables' columns of data, if NULL no side variables are included

cc_maxit

is the number of iteration of iterative optimisation of the algorithm to estimate channel capacity. Default is 100.

lr_maxit

is a maximum number of iteration of fitting algorithm of logistic regression. Default is 1000.

MaxNWts

is a maximum acceptable number of weights in logistic regression algorithm. Default is 5000.

formula_string

(optional) is a character object that includes a formula syntax to use in logistic regression model. If NULL, a standard additive model of response variables is assumed. Only for advanced users.

TestingSeed

is the seed for random number generator used in testing procedures

testing_cores

- number of cores to be used in parallel computing (via doParallel package)

boot_num

is the number of bootstrap tests to be performed. Default is 10, but it is recommended to use at least 50 for reliable estimates.

boot_prob

is the proportion of initial size of data to be used in bootstrap. Default is 0.8.

sidevar_num

is the number of re-shuffling tests of side variables to be performed. Default is 10, but it is recommended to use at least 50 for reliable estimates.

traintest_num

is the number of overfitting tests to be performed. Default is 10, but it is recommended to use at least 50 for reliable estimates.

partition_trainfrac

is the fraction of data to be used as a training dataset. Default is 0.6.

Details

If side variables are added within the analysis (side_variables is not NULL), two additional procedures are carried out: reshuffling test and reshuffling with bootstrap test, which are based on permutation of side variables values within the dataset. Additional parameters: lr_maxit and MaxNWts are the same as in definition of multinom function from nnet package. An alternative model formula (using formula_string arguments) should be provided if data are not suitable for description by logistic regression (recommended only for advanced users).

Value

a list with four elements:

  • output$bootstrap - confusion matrix of logistic regression predictions

  • output$resamplingMorph - channel capacity in bits

  • output$traintest - optimal probability distribution

  • output$bootResampMorph - nnet object describing logistic regression model (if model_out=TRUE)

Each of above is a list, where an element is an output of a single repetition of the channel capacity algorithm

References

[1] Jetka T, Nienaltowski K, Winarski T, Blonski S, Komorowski M, Information-theoretic analysis of multivariate single-cell signaling responses using SLEMI, PLoS Comput Biol, 15(7): e1007132, 2019, https://doi.org/10.1371/journal.pcbi.1007132.

Examples

## Please set boot_num and traintest_num with larger numbers 
## for a more reliable testing
tempdata=data_example1
outputCLR1_testing=capacity_logreg_testing(data=tempdata,
signal="signal", response="response",cc_maxit=10,
TestingSeed=11111, boot_num=1,boot_prob=0.8,testing_cores=1,
traintest_num=1,partition_trainfrac=0.6)


SLEMI documentation built on Nov. 20, 2023, 1:06 a.m.