testCpG: function to cluster sequences based on their CpG and GC...

Description Usage Arguments Value Author(s) References See Also Examples

Description

diagnostical function - GC content and CpG content are clustered using 2D gaussian models (Mclust). FALSE is returned if > max.clust (default=1) subgroups are found using the bayesian information criterion (BIC). If do.plot=TRUE, the results are visualized.

Usage

1
2
## S4 method for signature 'cobindr'
testCpG(x, max.clust = 4, do.plot = F, n.cpu = NA)

Arguments

x

an object of the class "cobindr", which will hold all necessary information about the sequences and the hits.

max.clust

integer describing the maximal number of clusters which are used for separating the data.

do.plot

logical flag, if do.plot=TRUE a scatterplot for the GC and CpG content for each sequence is produced and the clusters are color coded.

n.cpu

number of CPUs to be used for parallelization. Default value is 'NA' in which case the number of available CPUs is checked and than used.

Value

result

logical flag, FALSE is returned if more than one subgroups are found using the bayesian information criterion (BIC)

gc

matrix with rows corresponding to sequences and columns corresponding to GC and CpG content

Author(s)

Robert Lehmann <r.lehmann@biologie.hu-berlin.de>

References

the method uses clustering functions from the package "mclust" (http://www.stat.washington.edu/mclust/)

See Also

plot.gc

Examples

1
2
3
4
5
6
7
8
cfg <- cobindRConfiguration()
sequence_type(cfg) <- 'fasta'
sequence_source(cfg) <- system.file('extdata/example.fasta', package='cobindR')
# avoid complaint of validation mechanism 
pfm_path(cfg) <- system.file('extdata/pfms',package='cobindR')
pairs(cfg) <- '' 
runObj <- cobindr( cfg)
testCpG(runObj, max.clust = 2, do.plot = TRUE) 

cobindR documentation built on April 28, 2020, 6:40 p.m.