binarize_array: Fast Adaptive Binarization

Description Usage Arguments Details Value Author(s) Examples

Description

Performs fast adaptive binarization of numeric arrays, providing options for filtering rows with insufficient variation

Usage

1
2
binarize.array(x,min.filter=NA,var.filter=0,fc.filter=0,
na.filter = FALSE,log.base=NA,use.gap=FALSE)

Arguments

x

Numeric data input array used to generate binary output array. Each row of the array represents a different variable.

min.filter

Minimum-value filter: rows of x with no value greater than min.filter will have all values set to 0.

var.filter

Variation filter: the proportion of lowest-variance rows of x to have all values set to 0.

fc.filter

Fold-change filter: rows of x with maximum fold-change less than fc.filter will have all values set to 0.

na.filter

NA filter: all rows of x with _any_ NAs will have all values set to 0. NB: even with na.filter=FALSE any NA values will be passed through with output value NA.

log.base

Base of logarithm to use for calculating fold-changes in rows of x. Unless log.base=NA input data x is assumed to be log-transformed.

use.gap

Boolean indicating whether to use gap statistic to identify rows insufficiently converted to binary representation. If TRUE, execution will be _much_ slower.

Details

Implementation of an adaptive method for binarizing gene expression data on a per-probe basis and demonstrate the superior effectiveness of our method when compared with other, commonly used approaches. This adaptive binarization method can be applied to DNA methylation microarray data, which has implications for cross-platform integration, and can reduce batch effects in the data.

Value

Binarized representation of x. That is, a numeric array of same dimensions as input x, containing values 0 (representing a 'low' value of corresponding variable) and 1 (respresenting a 'high' value of the corresponding variable).

Author(s)

Ed Curry e.curry@imperial.ac.uk

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
## create a numeric array
x.cont <- array(runif(60),dim=c(10,6))
## Not run: x.cont

## find binary representation of array
x.bin <- binarize.array(x.cont)
## Not run: x.bin

## use gap statistic to filter insufficiently variable rows
x.gap <- binarize.array(x.cont,use.gap=TRUE)
## Not run: x.gap

Example output

Loading required package: SAGx
Loading required package: multtest
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package:BiocGenericsThe following objects are masked frompackage:parallel:

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked frompackage:stats:

    IQR, mad, sd, var, xtabs

The following objects are masked frompackage:base:

    anyDuplicated, append, as.data.frame, basename, cbind, colnames,
    dirname, do.call, duplicated, eval, evalq, Filter, Find, get, grep,
    grepl, intersect, is.unsorted, lapply, Map, mapply, match, mget,
    order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank,
    rbind, Reduce, rownames, sapply, setdiff, sort, table, tapply,
    union, unique, unsplit, which.max, which.min

Loading required package: Biobase
Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.

applying cluster-based binarization to 10 rows of data. This may take some time... 
applying cluster-based binarization to 10 rows of data. This may take some time... 
using gap-statistic to determine cluster number. if this takes too long, try setting 'use.gap=FALSE' 

ArrayBin documentation built on May 1, 2019, 10:20 p.m.