chip.deconv: High-resolution model-based deconvolution of normalized...
In MeDiChI: MeDiChI ChIP-chip deconvolution library

Description Usage Arguments Details Value Author(s) References See Also Examples

Deconvolves a subset of data on one chromosome, including multiple replicates, and running multiple bootstraps, if desired. To deconvolve an entire data set across multiple chromosomes, see 'deconv.entire.genome'.

chip.deconv(data, where = NA, center = NA, 
window = 30000, fit.res = 10, max.steps = 200, 
post.proc.factor = 2, min.npeaks = 0, max.npeaks = 99999,
selection.method = "bic", quant.cutoff = "q0.85", 
n.boot = 1, boot.sample.opt = c("residual","resample","case","wild",
"position","replicate")[1], max.peak = NA, boot.vary.res = F,
kernel = NA, tile.distance = NA, verbose = T, trace = F, ...)


plot.chip.deconv(x, boot.results = c("scaled.prob", "prob", "scale",
"conf=95", "NONE")[1], where = NA, center = NA, window = NULL, verbose = F,
plot.genes = F, org = NA, hi.res = NA, quants = c(0.95, 0.5, 0.05), 
smooth = T, ... )



print.chip.deconv(x, ...)

coef.chip.deconv(object, ...)

`data`	Input data matrix, connection, or filename. See Details.
`center`	Central chromosomal coordinate for the subset of data to deconvolve, or to be plotted. See Details.
`where`	The chromosome of the subset of data to deconvolve. See Details.
`window`	The window size (in base-pairs) of the subset of data to deconvolve; see Details. For 'plot.chip.deconv', limit the window size that is plotted. The default ('NULL'), is to use the 'window that was input to 'chip.deconv()'.
`fit.res`	Desired deconvolution resolution (base-pairs).
`kernel`	Required deconvolution kernel. See Details.
`max.steps`	Limit the number of LARS steps taken. Should opt towards higher values (above 200), as the fit from the highest step is used to estimate the noise for BIC.
`post.proc.factor`	Post-processing filter for combining deconvolution coefficients, in units of (n*fit.res).
`min.npeaks`	Optionally limit the minimum number of coefficients. Default is '0' – no lower limit.
`max.npeaks`	Optionally limit the maximum number of coefficients. Default is '99999' – no upper limit.
`selection.method`	Use argmin(method) to choose optimal model, one of c('bic','aic'). Default is 'bic'.
`quant.cutoff`	Intensity or quantile cutoff for data to be processed. Limits the locations of potential sites to those near probes that are above this quantile. See Details. Default is 'q0.85' – use 85th quantile.

`n.boot`	Number of bootstrap iterations to perform. If '0' (default) then no bootstraps are performed.
`boot.sample.opt`	Bootstrap resampling option. See Details. Default is 'case.
`max.peak`	Any coefficient with an intensity above this threshold is set to this value. Default is 'NA' – no cutoff.
`boot.vary.res`	If TRUE, vary the resolution (around 'fit.res') during bootstraps. May result in more realistic solutions.

tile.distance

Distance (in base pairs) between adjacent probes on the array. If 'NA' (the default) then this is computed from the data.

`verbose`	If TRUE, print out status messages.
`trace`	If TRUE, print out LARS progress.

`x`	Object output from 'chip.deconv()'
`object`	Object output from 'chip.deconv()'
`boot.results`	Plot bootstrap distributions, either 'prob' (posterior probability), 'scaled.prob' (intensity-scaled posterior probability), 'conf=95' (95% confidence intervals), or 'NONE' (no bootstrap results plotted). Default is 'scaled.prob'.

`plot.genes`	If TRUE, include gene positions in the plot. Currently only supported for Halobacterium and S. cerevisiae.
`org`	Organism used for 'plot.genes', one of either 'halo' or 'yeast'.
`quants`	Quantiles of model fits from bootstraps to include in plot (see Figure 3 in paper). Default is to plot 5th, 50th, and 95th quantiles.
`hi.res`	Compute high-resolution model fits at this resolution. Default is 'fit.res' as provided to 'chip.deconv()'.
`smooth`	If TRUE, plot kernel-smoothed bootstrap distributions, rather than simple counts.

...

Additional parameters passed to 'lars' or 'plot' or 'read.table' (if the 'data' parameter is a file name).

'chip.deconv' is used to deconvolve a limited subset of the entire tiling array data set. Usually this subset is limited to a single chromosome (the 'where' parameter) and a range of coordinates on the chromosome between 'center - window / 2' and 'center + window / 2'. The method will identify potential binding sites (coefficients) across this range at a resolution given by 'fit.res', up to a maximum of n.coeffs = window / fit.res coefficients.

It is not recommended (due to memory constraints) to attempt to fit more than a few thousand potential coefficients using this function. This number may be decreased by either decreasing the 'window' size or increasing the 'fit.res'. To deconvolve larger windows at high resolution, use 'deconv.entire.genome'.

The input 'data' can be formatted as one of:
(a) a 2-column matrix or data-frame containing probe coordinates (column 1) and intensities (column 2), with optional rownames containing the chromosome identifier for each probe, or
(b) a 3-column data frame containing probe chromosome identifiers (column 1), coordinates (column 2) and intensities (column 3), or
(c) a file name or connection pointing to either a GFF file or a 3-column tab-delimited file formatted as in (b).

Replicates are included in the input data simply as additional rows with the same probe coordinate.

Note that 'intensities' in the above description refers to relative (potentially normalized) intensities, exponentiated log-ratios (e.g. versus a reference) or some such quantity. The data should not be logged, as this will not conform with the MeDiChI peak model (see the reference below for details).

If the entire input 'data' matrix is to be processed using this function (only recommended if a subset of the entire array has been pre-selected; see above), or if the data only cover one chromosome, then the chromosome identifiers are not required.

If the input 'data' contains the entire array (including multiple chromosomes) then only the contiguous subset of probes within the window described above and with the matching chromosome identifier will be deconvolved.

'quant.cutoff' can be either a character starting with "q", e.g. "q0.85" to represent a quantile cutoff, or a numeric value, representing an absolute intensity cutoff. Only positions within +/- one 'tile.size' of any probe that has an intensity greater than this cutoff are considered to contain potential binding sites. This parameter may be used to decrease the runtime of the function.

'boot.sample.opt' can be one of 'wild' (Default) – wild resampling (see http://en.wikipedia.org/wiki/Bootstrapping_(statistics)\#Wild_bootstrap); 'residual' – wild resampling, only run on residuals for estimation of coefficient p-values; 'case' – case resampling; 'position' – resample the central positions of the probes; 'replicate' – resample probe intensities from the range given by their replicates (if replicates exist); or 'resample' – case sampling of intensities only for estimation of coefficient p-values.

'kernel' is a 2-column matrix providing position (column 1) and intensity (column 2) of the deconvolution kernel (profile model) to be used. This may be computed using 'generate.binding.profile' and parameters for this model may be learned from the data using 'fit.peak.profile'.

A list of class 'chip.deconv', for which 'plot', 'print', and 'coef' functions exist, and containing the following elements (repeated 'n.boot' times for each bootstrap run):

`data`	The input data subset (subset selected using 'where', 'window' and 'center'; see Description.
`fit`	Best-fit values at the locations of each probe in the input data.
`kernel`	Kernel used for deconvolution (provided by 'kernel' parameter).
`coeffs`	Non-zero coefficients (coordinate and intensity) for the chosen best-fit model.
`out.info`	Statistics on the best-fit solution including the LARS step number, BIC, RSS, etc.
`args`	All parameters input to 'chip.deconv', used for plotting and future reference.

David J Reiss, Institute for Systems Biology

Maintainer: <dreiss@systemsbiology.org>

Reiss, DJ and Facciotti, MT and Baliga, NS. (2007). "Model-based deconvolution of genome-wide DNA binding", Bioinformatics; doi: 10.1093/bioinformatics/btm592.

http://baliga.systemsbiology.net/medichi

deconv.entire.genome, fit.peak.profile, generate.fake.data, generate.binding.profile, MeDiChI-data, lars, quadprog, Matrix

## see 'help(MeDiChI)', or...
## Run the demo yourself:

data( "halo.lowres", package="MeDiChI" )

fit <- chip.deconv( data.halo.lowres, where="Chr", fit.res=30,
               center=650000, wind=20000, max.steps=100, n.boot=10,
               kernel=kernel.halo.lowres, verbose=TRUE, boot.sample.opt="case" )

plot( fit, plot.genes=TRUE, cex=0.5, cex.lab=0.8, cex.axis=0.8 )