binarize_array: Fast Adaptive Binarization
In ArrayBin: Binarization of numeric data arrays

Description Usage Arguments Details Value Author(s) Examples

Performs fast adaptive binarization of numeric arrays, providing options for filtering rows with insufficient variation

1 2	binarize.array(x,min.filter=NA,var.filter=0,fc.filter=0, na.filter = FALSE,log.base=NA,use.gap=FALSE)

`x`	Numeric data input array used to generate binary output array. Each row of the array represents a different variable.
`min.filter`	Minimum-value filter: rows of `x` with no value greater than `min.filter` will have all values set to `0`.
`var.filter`	Variation filter: the proportion of lowest-variance rows of `x` to have all values set to `0`.
`fc.filter`	Fold-change filter: rows of `x` with maximum fold-change less than `fc.filter` will have all values set to `0`.
`na.filter`	NA filter: all rows of `x` with _any_ NAs will have all values set to `0`. NB: even with `na.filter=FALSE` any `NA` values will be passed through with output value `NA`.
`log.base`	Base of logarithm to use for calculating fold-changes in rows of `x`. Unless `log.base=NA` input data `x` is assumed to be log-transformed.
`use.gap`	Boolean indicating whether to use gap statistic to identify rows insufficiently converted to binary representation. If `TRUE`, execution will be _much_ slower.

Implementation of an adaptive method for binarizing gene expression data on a per-probe basis and demonstrate the superior effectiveness of our method when compared with other, commonly used approaches. This adaptive binarization method can be applied to DNA methylation microarray data, which has implications for cross-platform integration, and can reduce batch effects in the data.

Binarized representation of x. That is, a numeric array of same dimensions as input x, containing values 0 (representing a 'low' value of corresponding variable) and 1 (respresenting a 'high' value of the corresponding variable).

Ed Curry e.curry@imperial.ac.uk

## create a numeric array
x.cont <- array(runif(60),dim=c(10,6))
## Not run: x.cont

## find binary representation of array
x.bin <- binarize.array(x.cont)
## Not run: x.bin

## use gap statistic to filter insufficiently variable rows
x.gap <- binarize.array(x.cont,use.gap=TRUE)
## Not run: x.gap

Loading required package: SAGx
Loading required package: multtest
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: ‘BiocGenerics’

The following objects are masked from ‘package:parallel’:

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from ‘package:stats’:

    IQR, mad, sd, var, xtabs

The following objects are masked from ‘package:base’:

    anyDuplicated, append, as.data.frame, basename, cbind, colnames,
    dirname, do.call, duplicated, eval, evalq, Filter, Find, get, grep,
    grepl, intersect, is.unsorted, lapply, Map, mapply, match, mget,
    order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank,
    rbind, Reduce, rownames, sapply, setdiff, sort, table, tapply,
    union, unique, unsplit, which.max, which.min

Loading required package: Biobase
Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.

applying cluster-based binarization to 10 rows of data. This may take some time... 
applying cluster-based binarization to 10 rows of data. This may take some time... 
using gap-statistic to determine cluster number. if this takes too long, try setting 'use.gap=FALSE'