getTFBSdata: Performs pattern-matching and extraction of genomic features...

View source: R/Wimtrap.R

getTFBSdataR Documentation

Performs pattern-matching and extraction of genomic features at match location

Description

getTFBSdata writes files encoding for datasets characterizing the genomic context around motif occurences along the genome for the considered transcription factors (training and/or studied TFs).

Usage

getTFBSdata(
  pfm = NULL,
  TFnames = NULL,
  organism = NULL,
  genome_sequence = "getGenome",
  imported_genomic_data,
  matches = NULL,
  strand_as_feature = FALSE,
  pval_threshold = 0.001,
  short_window = 20,
  medium_window = 400,
  long_window = 1000
)

Arguments

pfm

Path to a file including the position frequency or weight matrices (PFMs or PWMs) of the motifs recognized by the considered transcription factors (training and/or studied TFs). This file can be in different formats, determined based on the file extension: raw pfm (".pfm"), jaspar (".jaspar"), meme (".meme"), transfac (".transfac"), homer (".motif") or cis-bp (".txt").pfm can be set to NULL (default value) if you provide the results of pattern-matching obtained from an external source (see the argument matches).

TFnames

names of the considered transcription factors among those described in the pfm file.

organism

Binomial name of the organism. Can be set to NULL if you provide the genome sequence (see the argument genome_sequence)

genome_sequence

"getGenome" (by default) or local path to a FASTA file encoding the genomic sequence of the organism. The default value allows the automatic download of the genomic sequence (when organism is input) from ENSEMBL or ENSEMBL GENOMES.

imported_genomic_data

An object output by importGenomicData() and that includes data related to the chromatin state that are specific to the training or studied condition.

matches

NULL (by default) or a named list of GRanges objects. Each GRanges object is related to a given transcritpion factor and defines the location along the genome of the matches with the primary motif of the latter. The GRanges objects contain also a metadata column named 'matchLogPval'that gives the p-value of the matches. The list input through matches has to be named according to the names of the transcription factors considered. These names have to be consistent with those provided through the ChIP-peaks argument. The default value allows to perform the pattern-matching analysis with the function encoded by the Wimtrap package.

strand_as_feature

A logical. Should be considered as feature the orientation of the matches in relation to the direction of transcription of the closest transcript? Default is FALSE.

pval_threshold

P-value threshold to identify the matches with the primary motif of the transcription factors. Default is set to 0.001.

short_window

An integer (20 by default). Sets the length of the short-ranges window centered on the potential binding sites and on which the genomic features are extracted.

medium_window

An integer (400 by default). Sets the length of the medium-ranges window centered on the potential binding sites and on which the genomic features are extracted.

long_window

An integer (1000 by default). Sets the length of the long-ranges window centered on the potential binding sites and on which the genomic features are extracted.

Value

A vector indicating the local paths to the tab-delimited files in which are written the results of pattern-matching and genomic feature extraction for each of the transcription factors considered. The 5 first fields of these files describe the location of the potential binding sites identified by pattern-matching. The following fields contain the raw score and/or p-value of the matches, and the the genomic features extracted at location of the matches on short-, medium- and long-ranges-centered windows the label ('1' = "positive" = "ChIP-validated in the considered condition" or '0' = "negative") for the the training TFsr.

See Also

importGenomicData() for importing genomic data and buildTFBSmodel() to train a predictive model of transcription factor binding sites.

Examples

genomic_data.ex <- c(CE = system.file("extdata/conserved_elements_example.bed", package = "Wimtrap"),
                      DGF = system.file("extdata/DGF_example.bed", package = "Wimtrap"),
                      DHS = system.file("extdata/DHS_example.bed", package = "Wimtrap"),
                      X5UTR = system.file("extdata/x5utr_example.bed", package = "Wimtrap"),
                      CDS = system.file("extdata/cds_example.bed", package = "Wimtrap"),
                      Intron = system.file("extdata/intron_example.bed", package = "Wimtrap"),
                      X3UTR = system.file("extdata/x3utr_example.bed", package = "Wimtrap")
                     )
imported_genomic_data.ex <- importGenomicData(biomart = FALSE,
                                              genomic_data = genomic_data.ex,
                                              tss = system.file("extdata/tss_example.bed", package = "Wimtrap"),
                                              tts = system.file("extdata/tts_example.bed", package = "Wimtrap"))
TFBSdata.ex <- getTFBSdata(pfm = system.file("extdata/pfm_example.pfm", package = "Wimtrap"),
                           TFnames = c("PIF3", "TOC1"),
                           organism = NULL,
                           genome_sequence = system.file("extdata/genome_example.fa", package = "Wimtrap"),
                           imported_genomic_data = imported_genomic_data.ex)

RiviereQuentin/Wimtrap documentation built on June 29, 2024, 7:17 p.m.