mi_logreg_testing: Testing procedures for estimation of mutual information

View source: R/mi_logreg_testing.R

mi_logreg_testingR Documentation

Testing procedures for estimation of mutual information

Description

Diagnostic procedures that allows to compute the uncertainty of estimation of mutual information by SLEMI approach. Two main procedures are implemented: bootstrap, which execute estimation with using a fraction of data and overfitting test, which divides data into two parts: training and testing. Each of them is repeated specified number of times to obtain a distribution of our estimators. It is recommended to call this function from mi_logreg_main.R.

Usage

mi_logreg_testing(
  data,
  signal = "signal",
  response = "response",
  side_variables = NULL,
  pinput = NULL,
  lr_maxit = 1000,
  MaxNWts = 5000,
  formula_string = NULL,
  TestingSeed = 1234,
  testing_cores = 1,
  boot_num = 10,
  boot_prob = 0.8,
  sidevar_num = 10,
  traintest_num = 10,
  partition_trainfrac = 0.6
)

Arguments

data

must be a data.frame object. Cannot contain NA values.

signal

is a character object with names of columns of dataRaw to be treated as channel's input.

response

is a character vector with names of columns of dataRaw to be treated as channel's output

side_variables

(optional) is a character vector that indicates side variables' columns of data, if NULL no side variables are included

pinput

is a numeric vector with prior probabilities of the input values. Uniform distribution is assumed as default (pinput=NULL).

lr_maxit

is a maximum number of iteration of fitting algorithm of logistic regression. Default is 1000.

MaxNWts

is a maximum acceptable number of weights in logistic regression algorithm. Default is 5000.

formula_string

(optional) is a character object that includes a formula syntax to use in logistic regression model. If NULL, a standard additive model of response variables is assumed. Only for advanced users.

TestingSeed

is the seed for random number generator used in testing procedures

testing_cores

- number of cores to be used in parallel computing (via doParallel package)

boot_num

is the number of bootstrap tests to be performed. Default is 10, but it is recommended to use at least 50 for reliable estimates.

boot_prob

is the proportion of initial size of data to be used in bootstrap

sidevar_num

is the number of re-shuffling tests of side variables to be performed. Default is 10, but it is recommended to use at least 50 for reliable estimates.

traintest_num

is the number of overfitting tests to be performed. Default is 10, but it is recommended to use at least 50 for reliable estimates.

partition_trainfrac

is the fraction of data to be used as a training dataset

Details

If side variables are added within the analysis (side_variables is not NULL), two additional procedures are carried out: reshuffling test and reshuffling with bootstrap test, which are based on permutation of side variables values within the dataset. Additional parameters: lr_maxit and MaxNWts are the same as in definition of multinom function from nnet package. An alternative model formula (using formula_string arguments) should be provided if data are not suitable for description by logistic regression (recommended only for advanced users).

Value

a list with elements:

  • output$bootstrap - bootstrap test

  • output$traintest - overfitting test

  • output$reshuffling_sideVar - (if side_variables is not NULL) re-shuffling test

  • output$bootstrap_Reshuffling_sideVar - (if side_variables is not NULL) re-shuffling test with a bootstrap

Each of the above is a list, where an element is a standard output of a single mi_logreg_algorithm run.

References

[1] Jetka T, Nienaltowski K, Winarski T, Blonski S, Komorowski M, Information-theoretic analysis of multivariate single-cell signaling responses using SLEMI, PLoS Comput Biol, 15(7): e1007132, 2019, https://doi.org/10.1371/journal.pcbi.1007132.

Examples

## Compute  uncertainty of mutual information estimator using 1 core
## Set boot_num and traintest_num with larger numbers for more reliable testing
tempdata=data_example1
output=mi_logreg_testing(data=tempdata,
                   signal = "signal",
                   response = "response",
                   testing_cores = 1,boot_num=1,traintest_num=1)

SLEMI documentation built on Nov. 20, 2023, 1:06 a.m.