computeOptimal: compute Optimal Parameters

View source: R/computeOptimal.R

computeOptimalR Documentation

compute Optimal Parameters

Description

ChIPanalyser contains a set of functions some of which require two parameters known as lambdaPWM and as boundMolecules. These two paramters are not always known. computeOptimal will compute these values by maximising the correlation and minimising the Mean Squared Error between a predicted ChIP-seq-like profile and a real ChIP-seq profile for a given loci.

Usage

computeOptimal(genomicProfiles,DNASequenceSet, ChIPScore,chromatinState = NULL,
    parameterOptions = NULL, optimalMethod = "all",rank=FALSE,returnAll=TRUE,
    peakMethod="moving_kernel",cores=1)

Arguments

genomicProfiles

genomicProfiles is a genomicProfiles object containing at least a Postion Frequency Matrix or a Position Weight Matrix. It is strongly advised to customize this object to increase goodness of fit of the model when compared to real ChIP-seq data.

DNASequenceSet

DNASequenceSet is a DNAStringSet or a BSgenome of the full sequence of the organism of interest.

ChIPScore

ChIPScore is a named list containing ChIP-seq enrichements for each Loci of interest. This Profile should be normalised to a base pair level. In other words, there should be an enrichement score for each base pair of a given Locus.

chromatinState

chromatinState is a GRanges object containing either accesible sites or DNA affinity scores.

parameterOptions

parameterOptions is a parameterOptions object. If this object is not provided (parameterOptions = NULL), a new object will be created internally. However, it is strongly advised to tailor this object to maximise the goodness of fit of the model when compared to ChIP-seq data.

optimalMethod

optimalMethod is a character string which determines which method for optimal parameter selection should be selected. optimalMethod can be one of the following: pearson, spearman, kendall, ks, fscore, geometric,MSE, or all. Default is set to all.

rank

rank is a logical value indicating if optimal parameters should be based on rank (parameter combination occuring the most over all regions) or avaerage score (best perfomring combination of paramters on average over all regions selected). DEFAULT = FALSE

returnAll

returnAll is a logical value indicating if all internal objects should be returned. DEFAULT = TRUE. Internal objects are the following: Occupancy Scores, ChIP like profiles, goodness of fit metrics and optimal paramters. If set to FALSE, computeOptimal will only return the optimal parameters.

peakMethod

peakMethod is a character string of one of the following: c("moving_kernel","truncated_kernel","exact"). If set to moving_kernel, the peaks will be approximated using Rcpp (Default). If set to truncated_kernel, the peaks will be approximated however this method does not require Rcpp. If set to exact, the peaks will not be approximated.

cores

cores is the number cores that will be used to compute optimal set of parameters.

Details

In order to backward infer the values of lambdaPWM and boundMolecules, it is possible to use the computeOptimal to find these parameters. It should be noted that this functions requires a ChIP-seq data input. ChIPScore (ChIP-seq data). This should be the output of the processingChIP function.

Value

computeOptimal returns a list respectivly described as the optimal set of Parameters (lambda - lambdaPWM and boundMolecules), the optimal matrix (a matrix containing accuracy estimates dependant on the parameter chosen), and finally the chosen parameter. If the parameter that was chosen was "all", then each element of this list will contain the optimal set of parameters, optimal matricies for all of the aforementioned paramters (see optimalMethod).

Author(s)

Patrick C. N. Martin <pm16057@essex.ac.uk>

References

Zabet NR, Adryan B (2015) Estimating binding properties of transcription factors from genome-wide binding profiles. Nucleic Acids Res., 43, 84–94.

Examples


#Data extraction
data(ChIPanalyserData)
# path to Position Frequency Matrix
PFM <- file.path(system.file("extdata",package="ChIPanalyser"),"BEAF-32.pfm")
#As an example of genome, this example will run on the Drosophila genome

if(!require("BSgenome.Dmelanogaster.UCSC.dm6", character.only = TRUE)){
    if (!requireNamespace("BiocManager", quietly=TRUE))
        install.packages("BiocManager")
    BiocManager::install("BSgenome.Dmelanogaster.UCSC.dm6")
    }
library(BSgenome.Dmelanogaster.UCSC.dm6)
DNASequenceSet <- getSeq(BSgenome.Dmelanogaster.UCSC.dm6)
chip<-processingChIP(chip,top)
#Building data objects
GPP <- genomicProfiles(PFM=PFM,PFMFormat="JASPAR",BPFrequency=DNASequenceSet)
OPP <- parameterOptions()
#Computing Optimal set of Parameters
optimalParam <- computeOptimal(genomicProfiles = GPP,
    DNASequenceSet = DNASequenceSet,
    ChIPScore = chip,
    chromatinState = Access,
    parameterOptions = OPP,
    parameter = "all",
    peakMethod="moving_kernel")


patrickCNMartin/ChIPanalyser documentation built on Nov. 24, 2022, 12:02 a.m.