predictTFBS: Predicts transcription factor binding sites

View source: R/Wimtrap.R

predictTFBSR Documentation

Predicts transcription factor binding sites

Description

Predicts the binding sites of studied transcription factors in a studied condition by applying a predictive model on potential binding sites located by a prior pattern-matching analysis and annotated with genomic features extracted at their location.

Usage

predictTFBS(
  TFBSmodel,
  TFBSdata,
  studiedTFs = NULL,
  show_annotations = FALSE,
  score_threshold = 0.86
)

Arguments

TFBSmodel

A xgb.Booster object as output by the function buildTFBSmodel().

TFBSdata

A named character vector as output by the getTFBSdata() function, defining the local paths to files encoding for the results of pattern-matching and genomic feature extraction for the training TFs and/or studied TFs.

studiedTFs

A character vector setting the names of the studied transcription factors (for which the location of the binding sites has to be predicted). This names have to match with one of the names of the TFBSdata argument.

show_annotations

A logical. Default = FALSE. If TRUE, the annotation of the potential binding sites with the genomic features extracted from their genomic regions will be output.

score_threshold

A numeric (comprised between 0 and 1). Sets the minimum prediction score output by the TFBSmodel to predict a potential binding site as a binding site of a studied transcription factor in the studied condition. Higher the prediction score, higher is the specificity and lower the sensitivity. Default = 0.5.

Details

Remark: the features included in the datasets encoded in the files given by TFBSdata have to correspond to those integrated by the predictive model set by TFBSmodel.

Value

A data.table listing the predicted binding sites. The 'TF' column annotates the potential binding sites with their cognate transcription factor. Additionally, the data.table describes, for the potential binding sites, the chromosomic coordinates, the closest transcript (relatively to the transcript start site) and the prediction score. Optionally, the data.table might also include the genomic features used to make the predictions.

See Also

importGenomicData() for importing genomic data, buildTFBSmodel() to train a predictive model of transcription factor binding sites, and plotPredictions() to vizualize the results for a given potential target gene.

Examples

genomic_data.ex <- c(CE = system.file("extdata/conserved_elements_example.bed", package = "Wimtrap"),
                      DGF = system.file("extdata/DGF_example.bed", package = "Wimtrap"),
                      DHS = system.file("extdata/DHS_example.bed", package = "Wimtrap"),
                      X5UTR = system.file("extdata/x5utr_example.bed", package = "Wimtrap"),
                      CDS = system.file("extdata/cds_example.bed", package = "Wimtrap"),
                      Intron = system.file("extdata/intron_example.bed", package = "Wimtrap"),
                      X3UTR = system.file("extdata/x3utr_example.bed", package = "Wimtrap")
                     )
imported_genomic_data.ex <- importGenomicData(biomart = FALSE,
                                              genomic_data = genomic_data.ex,
                                              tss = system.file("extdata/tss_example.bed", package = "Wimtrap"),
                                              tts = system.file("extdata/tts_example.bed", package = "Wimtrap"))
TFBSdata.ex <- getTFBSdata(pfm = system.file("extdata/pfm_example.pfm", package = "Wimtrap"),
                           TFnames = c("PIF3", "TOC1"),
                           organism = NULL,
                           genome_sequence = system.file("extdata/genome_example.fa", package = "Wimtrap"),
                           imported_genomic_data = imported_genomic_data.ex)
TFBSmodel.ex <- buildTFBSmodel(TFBSdata = TFBSdata.ex,
                               ChIPpeaks = c(PIF3 = system.file("extdata/PIF3_example.bed", package = "Wimtrap"),
                                             TOC1 = system.file("extdata/TOC1_example.bed", package = "Wimtrap")),
                               TFs_validation = "PIF3")
PIF3BS.predictions <- predictTFBS(TFBSmodel.ex,
                                  TFBSdata.ex,
                                  studiedTFs = "PIF3")
##To get the transcripts whose expression is potentially regulated by PIF3 do as follows:
PIF3_regulated.predictions <- as.character(PIF3BS.predictions$transcript[!duplicated(PIF3BS.predictions)])
###If you want to consider only the gene model,
###then do as follows:
PIF3_regulated.predictions <- unlist(strsplit(PIF3_regulated.predictions, "[.]"))[seq(1, 2*length(PIF3_regulated.predictions),2)]
PIF3_regulated.predictions <- PIF3_regulated.predictions[!duplicated(PIF3_regulated.predictions)]

RiviereQuentin/Wimtrap documentation built on June 29, 2024, 7:17 p.m.