Home

/

GitHub

/

RiviereQuentin/Wimtrap

/

testTargetPredictions: Test the performances of predicting gene targets based on the...

testTargetPredictions: Test the performances of predicting gene targets based on the...
In RiviereQuentin/Wimtrap: Integrative tools to predict the location of transcription factor binding sites

View source: R/Wimtrap.R

testTargetPredictions

R Documentation

Test the performances of predicting gene targets based on the location of potential TFBS identified by Wimtrap

Description

This function aims at defining the optimal threshold to set on the TFBS prediction score output by Wimtrap in order to infer the potential gene targets of transcription factors. Subsequently, the performances at predicting gene targets are assessed for each transcription factor considered by considering the whole set of potential TFBS on a given chromosome (unbalanced dataset).

Usage

testTargetPredictions(
  TFBSdata,
  TFBSmodel,
  chrTest = 1,
  tss,
  ChIPpeaks = NULL,
  ChIPpeaks_length = 400,
  targets = NULL
)

Arguments

`TFBSdata`	A named character vector as output by the `getTFBSdata()` function, defining the local paths to files encoding for the results of pattern-matching and geonmic feature extraction for the training TFs and/or studied TFs.
`TFBSmodel`	A `xgb.Booster` object as output by the function `buildTFBSmodel()`.
`chrTest`	An integer specifying the number of the chromosome that will be considered to assess the performances at predicting the TF gene targets. Default = 1.
`tss`	A list of `GRanges` objects as output by `importGenomicData()` or local path to a BED file defining the transcription stat site (TSS), name and orientation of each protein-coding transcript of the organism.
`ChIPpeaks`	A named character vector defining the local paths to BED files encoding the location of ChIP-peaks. The vector is named according to the transcription factors that are described by the files indicated. Caution: the names of the `ChIPpeaks` have to find a match with those of `TFBSdata`. Default is `NULL` and
`ChIPpeaks_length`	An integer setting a fixed length for the ChIP-peaks, that are defined as the intervals of `ChIPpeaks_length` bp that are centered on the regions encoded in the `ChIPpeaks` files. Default value = 400.
`targets`	A named character vector defining the local paths to text files encoding the manually curated transcriptional targets of each transcription factor. The vector is named according to the transcription factors that are described by the files indicated. Caution: the names of the `ChIPpeaks` have to find a match with those of `TFBSdata`. Default = NULL (Transcriptional targets are predicted from ChIP-peaks)

Details

Each gene is at first scored with the highest prediction score among the TFBSs associated with it and predicted by Wimtrap. Each gene is then labelled as positive or negative. The positive genes are the genes whose the TSS is the closest to an occurrence on a ChIP-peak of the cognate TF-primary motif. This allows to draw a ROC curve based on a balanced dataset obtained from all the chromosomes but one and to identify the best threshold to set on the prediction score in order to predict TF gene targets. Finally, the performances are assessed for each TF based on the whole dataset of predicted TFBS on the left-over chromosome.

Value

A data.frame that gives, for each TF considered, the performances of prediction of the transcriptional targets encoded on the test chromosome, taking into consideration all the TFBSs predicted by Wimtrap (prediction score >= 0.5) on that chromosome. Due to the highly imbalanced dataset, the performances are expressed in terms of recall, precision, accuracy and F-score. In addition, in the last column, is presented the optimal threshold obtained when including all the input TFs.

Examples

genomic_data.ex <- c(CE = system.file("extdata/conserved_elements_example.bed", package = "Wimtrap"),
                      DGF = system.file("extdata/DGF_example.bed", package = "Wimtrap"),
                      DHS = system.file("extdata/DHS_example.bed", package = "Wimtrap"),
                      X5UTR = system.file("extdata/x5utr_example.bed", package = "Wimtrap"),
                      CDS = system.file("extdata/cds_example.bed", package = "Wimtrap"),
                      Intron = system.file("extdata/intron_example.bed", package = "Wimtrap"),
                      X3UTR = system.file("extdata/x3utr_example.bed", package = "Wimtrap")
                     )
imported_genomic_data.ex <- importGenomicData(biomart = FALSE,
                                              genomic_data = genomic_data.ex,
                                              tss = system.file("extdata/tss_example.bed", package = "Wimtrap"),
                                              tts = system.file("extdata/tts_example.bed", package = "Wimtrap"))
TFBSdata.ex <- getTFBSdata(pfm = system.file("extdata/pfm_example.pfm", package = "Wimtrap"),
                           TFnames = c("PIF3", "TOC1"),
                           organism = NULL,
                           genome_sequence = system.file("extdata/genome_example.fa", package = "Wimtrap"),
                           imported_genomic_data = imported_genomic_data.ex)
TFBSmodel.ex <- buildTFBSmodel(TFBSdata = TFBSdata.ex,
                               ChIPpeaks = c(PIF3 = system.file("extdata/PIF3_example.bed", package = "Wimtrap"),
                                             TOC1 = system.file("extdata/TOC1_example.bed", package = "Wimtrap")),
                               TFs_validation = "PIF3")
##Determine the optimal score threshold
targetPerformances <- testTargetPredictions(
TFBSdata = TFBSdata.ex["TOC1"],
TFBSmodel = TFBSmodel.ex, 
tss = imported_genomic_data.ex,
ChIPpeaks =  c(TOC1 = system.file("extdata/TOC1_example.bed", package = "Wimtrap")))
optimal_threshold <- targetPerformances$threshold[1]
PIF3BS.predictions <- predictTFBS(TFBSmodel.ex,
                                  TFBSdata.ex,
                                  studiedTFs = "PIF3",
                                  score_threshold = optimal_threshold)
##To get the transcripts whose expression is potentially regulated by PIF3 do as follows:
PIF3_regulated.predictions <- as.character(PIF3BS.predictions$transcript[!duplicated(PIF3BS.predictions)])
###If you want to consider only the gene model,
###then do as follows:
PIF3_regulated.predictions <- unlist(strsplit(PIF3_regulated.predictions, "[.]"))[seq(1, 2*length(PIF3_regulated.predictions),2)]
PIF3_regulated.predictions <- PIF3_regulated.predictions[!duplicated(PIF3_regulated.predictions)]

RiviereQuentin/Wimtrap documentation built on June 29, 2024, 7:17 p.m.

RiviereQuentin/Wimtrap index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

RiviereQuentin/Wimtrap
Integrative tools to predict the location of transcription factor binding sites

testTargetPredictions: Test the performances of predicting gene targets based on the...
In RiviereQuentin/Wimtrap: Integrative tools to predict the location of transcription factor binding sites

Test the performances of predicting gene targets based on the location of potential TFBS identified by Wimtrap

Description

Usage

Arguments

Details

Value

See Also

Examples

Related to testTargetPredictions in RiviereQuentin/Wimtrap...

R Package Documentation

Browse R Packages

We want your feedback!

RiviereQuentin/Wimtrap Integrative tools to predict the location of transcription factor binding sites

testTargetPredictions: Test the performances of predicting gene targets based on the... In RiviereQuentin/Wimtrap: Integrative tools to predict the location of transcription factor binding sites

Test the performances of predicting gene targets based on the location of potential TFBS identified by Wimtrap

Description

Usage

Arguments

Details

Value

See Also

Examples

Related to testTargetPredictions in RiviereQuentin/Wimtrap...

R Package Documentation

Browse R Packages

We want your feedback!

RiviereQuentin/Wimtrap
Integrative tools to predict the location of transcription factor binding sites

testTargetPredictions: Test the performances of predicting gene targets based on the...
In RiviereQuentin/Wimtrap: Integrative tools to predict the location of transcription factor binding sites