Description Usage Arguments Details Value Author(s) Examples

View source: R/pcKeepCompDetect.R

This function tries to obtain the minimum number of components needed in a
FFT filter to achieve or get as close as possible to a given correlation
value. Usually you don't need to call directly this function, is used in
`filterFFT`

by default.

1 2 3 | ```
pcKeepCompDetect(data, pc.min = 0.01, pc.max = 0.1, max.iter = 20,
verbose = FALSE, cor.target = 0.98, cor.tol = 0.001, smpl.num = 25,
smpl.min.size = 2^10, smpl.max.size = 2^14)
``` |

`data` |
Numeric vector to be filtered |

`pc.min, pc.max` |
Range of allowed values for |

`max.iter` |
Maximum number of iterations |

`verbose` |
Extra information (debug) |

`cor.target` |
Target correlation between the filtered and the original profiles. A value around 0.99 is recommeded for Next Generation Sequencing data and around 0.7 for Tiling Arrays. |

`cor.tol` |
Tolerance allowed between the obtained correlation an the target one. |

`smpl.num` |
If |

`smpl.min.size, smpl.max.size` |
Minimum and maximum size of the samples. This is used for selection and sub-selection of ranges with meaningful values (i,e, different from 0 and NA). Power of 2 values are recommended, despite non-mandatory. |

`...` |
Parameters to be pass to |

This function predicts a suitable `pcKeepComp`

value for `filterFFT`

function. This is the recommended amount of components (in percentage) to
keep in the `filterFFT`

function to obtain a correlation of (or near of)
`cor.target`

.

The search starts from two given values `pc.min`

, `pc.max`

and uses linial
interpolation to quickly reach a value that gives a corelation between the
filtered and the original near `cor.target`

within the specified tolerance
`cor.tol`

.

To allow a quick detection without an exhaustive search, this function uses
a subset of the data by randomly sampling those regions with meaningful
coverage values (i,e, different from 0 or NA) larger than `smpl.min.size`

.
If it's not possible to obtain `smpl.max.size`

from this region (this could
be due to flanking 0's, for example) at least `smpl.min.size`

will be used
to check correlation. Mean correlation between all sampled regions is used
to test the performance of the `pcKeepComp`

parameter.

If the number of meaningful bases in `data`

is less than `smpl.min.size * (smpl.num/2)`

all the `data`

vector will be used instead of using sampling.

Fitted `pcKeepComp`

value

Oscar Flores [email protected], David Rosell [email protected]

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 | ```
# Load dataset
data(nucleosome_htseq)
data <- as.vector(coverage.rpm(nucleosome_htseq)[[1]])
# Get recommended pcKeepComp value
pckeepcomp <- pcKeepCompDetect(data, cor.target=0.99)
print(pckeepcomp)
# Call filterFFT
f1 <- filterFFT(data, pcKeepComp=pckeepcomp)
# Also this can be called directly
f2 <- filterFFT(data, pcKeepComp="auto", cor.target=0.99)
# Plot
library(ggplot2)
i <- 1:2000
plot_data <- rbind(
data.frame(x=i, y=data[i], coverage="original"),
data.frame(x=i, y=f1[i], coverage="two calls"),
data.frame(x=i, y=f2[i], coverage="one call")
)
qplot(x=x, y=y, color=coverage, data=plot_data, geom="line",
xlab="position", ylab="coverage")
``` |

nucleR documentation built on Nov. 1, 2018, 2:23 a.m.

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.