testTargetPredictions | R Documentation |
This function aims at defining the optimal threshold to set on the TFBS prediction score output by Wimtrap in order to infer the potential gene targets of transcription factors. Subsequently, the performances at predicting gene targets are assessed for each transcription factor considered by considering the whole set of potential TFBS on a given chromosome (unbalanced dataset).
testTargetPredictions(
TFBSdata,
TFBSmodel,
chrTest = 1,
tss,
ChIPpeaks = NULL,
ChIPpeaks_length = 400,
targets = NULL
)
TFBSdata |
A named character vector as output by the |
TFBSmodel |
A |
chrTest |
An integer specifying the number of the chromosome that will be considered to assess the performances at predicting the TF gene targets. Default = 1. |
tss |
A list of |
ChIPpeaks |
A named character vector defining the local paths to BED files encoding the
location of ChIP-peaks. The vector is named according to the transcription factors that are described
by the files indicated. Caution: the names of the |
ChIPpeaks_length |
An integer setting a fixed length for the ChIP-peaks, that are defined as the intervals of
|
targets |
A named character vector defining the local paths to text files encoding the
manually curated transcriptional targets of each transcription factor. The vector is named according to the transcription factors
that are described by the files indicated. Caution: the names of the |
Each gene is at first scored with the highest prediction score among the TFBSs associated with it and predicted by Wimtrap. Each gene is then labelled as positive or negative. The positive genes are the genes whose the TSS is the closest to an occurrence on a ChIP-peak of the cognate TF-primary motif. This allows to draw a ROC curve based on a balanced dataset obtained from all the chromosomes but one and to identify the best threshold to set on the prediction score in order to predict TF gene targets. Finally, the performances are assessed for each TF based on the whole dataset of predicted TFBS on the left-over chromosome.
A data.frame
that gives, for each TF considered, the performances of prediction of the transcriptional
targets encoded on the test chromosome, taking into consideration all the TFBSs predicted by Wimtrap (prediction score >= 0.5)
on that chromosome.
Due to the highly imbalanced dataset, the performances are expressed in terms of recall, precision, accuracy and F-score.
In addition, in the last column, is presented the optimal threshold obtained when including all the input TFs.
plotPredictions()
to vizualize the results for a given potential target gene.
genomic_data.ex <- c(CE = system.file("extdata/conserved_elements_example.bed", package = "Wimtrap"),
DGF = system.file("extdata/DGF_example.bed", package = "Wimtrap"),
DHS = system.file("extdata/DHS_example.bed", package = "Wimtrap"),
X5UTR = system.file("extdata/x5utr_example.bed", package = "Wimtrap"),
CDS = system.file("extdata/cds_example.bed", package = "Wimtrap"),
Intron = system.file("extdata/intron_example.bed", package = "Wimtrap"),
X3UTR = system.file("extdata/x3utr_example.bed", package = "Wimtrap")
)
imported_genomic_data.ex <- importGenomicData(biomart = FALSE,
genomic_data = genomic_data.ex,
tss = system.file("extdata/tss_example.bed", package = "Wimtrap"),
tts = system.file("extdata/tts_example.bed", package = "Wimtrap"))
TFBSdata.ex <- getTFBSdata(pfm = system.file("extdata/pfm_example.pfm", package = "Wimtrap"),
TFnames = c("PIF3", "TOC1"),
organism = NULL,
genome_sequence = system.file("extdata/genome_example.fa", package = "Wimtrap"),
imported_genomic_data = imported_genomic_data.ex)
TFBSmodel.ex <- buildTFBSmodel(TFBSdata = TFBSdata.ex,
ChIPpeaks = c(PIF3 = system.file("extdata/PIF3_example.bed", package = "Wimtrap"),
TOC1 = system.file("extdata/TOC1_example.bed", package = "Wimtrap")),
TFs_validation = "PIF3")
##Determine the optimal score threshold
targetPerformances <- testTargetPredictions(
TFBSdata = TFBSdata.ex["TOC1"],
TFBSmodel = TFBSmodel.ex,
tss = imported_genomic_data.ex,
ChIPpeaks = c(TOC1 = system.file("extdata/TOC1_example.bed", package = "Wimtrap")))
optimal_threshold <- targetPerformances$threshold[1]
PIF3BS.predictions <- predictTFBS(TFBSmodel.ex,
TFBSdata.ex,
studiedTFs = "PIF3",
score_threshold = optimal_threshold)
##To get the transcripts whose expression is potentially regulated by PIF3 do as follows:
PIF3_regulated.predictions <- as.character(PIF3BS.predictions$transcript[!duplicated(PIF3BS.predictions)])
###If you want to consider only the gene model,
###then do as follows:
PIF3_regulated.predictions <- unlist(strsplit(PIF3_regulated.predictions, "[.]"))[seq(1, 2*length(PIF3_regulated.predictions),2)]
PIF3_regulated.predictions <- PIF3_regulated.predictions[!duplicated(PIF3_regulated.predictions)]
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.