dichotomize: Dichotomize Continuous Data Set With Labels

Description Usage Arguments Details Value Author(s) References See Also Examples

View source: R/dichotomize.R

Description

dichotomize converts a matrix containing continous measurements into a binary matrix.

optimizeThreshold determines optimal thresholds for dichotomization.

Usage

1
2
dichotomize(X, thresh)
optimizeThreshold(X, L, lambda.freqs, verbose=FALSE)

Arguments

X

data matrix (columns correspond to variables, rows to samples).

thresh

vector of thresholds, one for each variable (column).

L

factor containing the class labels, one for each sample (row).

lambda.freqs

shrinkage parameter for class frequencies (if not specified it is estimated).

verbose

report shrinkage intensity and other information.

Details

dichotomize assigns 0 if a matrix entry is lower than given column-specific threshold, otherwise it assigns 1.

optimizeThreshold uses (approximate) mutual information to determine the optimal thresholds. Specifically, the thresholds are chosen to maximize the mutual information between response and each variable. The same criterion is also used in binda.ranking. For detailed description of the dichotomization procedure see Gibb and Strimmer (2015).

Class frequencies are estimated using freqs.shrink.

Value

dichotomize returns a binary matrix.

optimizeThreshold returns a vector containing the variable thresholds.

Author(s)

Sebastian Gibb and Korbinian Strimmer (https://strimmerlab.github.io).

References

Gibb, S., and K. Strimmer. 2015. Differential protein expression and peak selection in mass spectrometry data by binary discriminant analysis. Bioinformatics 31:3156-3162. <DOI:10.1093/bioinformatics/btv334>

See Also

binda.ranking, freqs.shrink, mi.plugin, is.binaryMatrix.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
# load binda library
library("binda")

# example data with 6 variables (in columns) and 4 samples (in rows)
X = matrix(c(1, 1, 1, 1.75, 0.4,    0,
             1, 1, 2,    2, 0.4, 0.09,
             1, 0, 1,    1, 0.5,  0.1,
             1, 0, 1,  0.5, 0.6,  0.1), nrow=4, byrow=TRUE)
colnames(X) = paste0("V", 1:ncol(X))

# class labels
L = factor(c("Treatment", "Treatment", "Control", "Control") )
rownames(X) = paste0(L, rep(1:2, times=2))

X
#          V1 V2 V3   V4  V5   V6
#Treatment1  1  1  1 1.75 0.4 0.00
#Treatment2  1  1  2 2.00 0.4 0.09
#Control1    1  0  1 1.00 0.5 0.10
#Control2    1  0  1 0.50 0.6 0.10

# find optimal thresholds (one for each variable)
thr = optimizeThreshold(X, L)
thr
#  V1   V2   V3   V4   V5   V6 
#1.00 1.00 2.00 1.75 0.50 0.10

# convert into binary matrix
# if value is lower than threshold -> 0 otherwise -> 1
Xb = dichotomize(X, thr)
is.binaryMatrix(Xb) # TRUE
Xb
#          V1 V2 V3 V4 V5 V6
#Treatment1  1  1  0  1  0  0
#Treatment2  1  1  1  1  0  0
#Control1    1  0  0  0  1  1
#Control2    1  0  0  0  1  1
#attr(,"thresh")
#  V1   V2   V3   V4   V5   V6 
#1.00 1.00 2.00 1.75 0.50 0.10

Example output

Loading required package: entropy
           V1 V2 V3   V4  V5   V6
Treatment1  1  1  1 1.75 0.4 0.00
Treatment2  1  1  2 2.00 0.4 0.09
Control1    1  0  1 1.00 0.5 0.10
Control2    1  0  1 0.50 0.6 0.10
  V1   V2   V3   V4   V5   V6 
1.00 1.00 2.00 1.75 0.50 0.10 
[1] TRUE
           V1 V2 V3 V4 V5 V6
Treatment1  1  1  0  1  0  0
Treatment2  1  1  1  1  0  0
Control1    1  0  0  0  1  1
Control2    1  0  0  0  1  1
attr(,"thresh")
  V1   V2   V3   V4   V5   V6 
1.00 1.00 2.00 1.75 0.50 0.10 

binda documentation built on Nov. 21, 2021, 1:07 a.m.

Related to dichotomize in binda...