importGenomicData: Imports genomic data into the R session

View source: R/Wimtrap.R

importGenomicDataR Documentation

Imports genomic data into the R session

Description

Imports genomic data that will allow to define contextual features around matches with the primary motifs of transcription factors. Data about gene structures might be optionally automatically downloaded from Biomart.

Usage

importGenomicData(
  organism = NULL,
  genomic_data,
  biomart = TRUE,
  tss = NULL,
  tts = NULL,
  promoter_length = 2000,
  downstream_length = 1000,
  proximal_length = 500
)

Arguments

organism

Binomial name of the organism. Can be set to NULL if you provide the location of the transcription start sites (TSS), transcription termination site (TTS) and structures of the protein-coding genes of the organism (see the arguments tss, tts and genomic_data).

genomic_data

A named character vector defining the local paths to BED files describing genomic features. The vector has to be named according to the features described by the files indicated. All the data related to the chromatin state have to be specific of the samed condition. The properties of the BED files are the following, depending on the type of feature: if 'numeric', the score field of the file is fulfilled or empty - if empty, the score will automatically be set to '1'; if 'categorical', the score field of the file is empty while its name field is fulfilled with the name of the categories of features. If you want to input the location of gene structures, name the file paths with exactly the following names: 'ProximalPromoter', 'Promoter', 'X5UTR', 'X3UTR', 'CDS', 'Intron', 'Downstream'. If you use these names, the potential binding sites will be annotated only with the gene structure that their centers overlap. If you don not use these names, the data related to gene structure will be extracted in the same way than for the others (average of the signal on windows of 3 different lengths centered on the potential binding sites). 'X5UTR' stands for 5'Untranslated Region, 'X3UTR' for '3'Untranslated Region', 'CDS' for coding sequence, 'Donwstream' for the regions downstream from the transcription termination site.

biomart

Logical. Should be automatically downloaded through biomart the location of the transcription start sites (TSS), transcription termination site (TTS) and structures of the protein-coding genes of the organism? Default is TRUE.

tss

NULL (by default) or local path to a BED file defining the transcription stat site (TSS), name and orientation of each protein-coding transcript of the organism. The default value allows to download automatically these informations from biomart.

tts

NULL (by default) or local path to a BED file defining the transcription termination site (TTS), name and orientation of each protein-coding transcript of the organism. The default value allows to download automatically these informations from biomart.

promoter_length

An integer setting the length of the promoters. By default, the promoter is defined as the region spanning the 2000bp upstream of the transcription start site (TSS).

downstream_length

An integer setting the length of the downstream regions. By default, the downstream region is defined as the region spanning the 1000bp downstream of the transcription termnination site (TTS).

proximal_length

An integer setting the length of the proximal promoters. By default, the proximal promoter is defined as the region spanning the 500bp upstream of the transcription start site (TSS).

Value

A named list of GRanges objects. Each component of the list is named according to the nature of the genomic data. The last two components describe respectively the position of the transcritpion termination site (TTS) and the transcription start site (TSS) of each protein-coding transcripts of the organism considered.

Examples

## Without automatic download from Biomart of data related to gene structure
genomic_data.ex <- c(CE = system.file("extdata/conserved_elements_example.bed", package = "Wimtrap"),
                      DGF = system.file("extdata/DGF_example.bed", package = "Wimtrap"),
                      DHS = system.file("extdata/DHS_example.bed", package = "Wimtrap"),
                      X5UTR = system.file("extdata/x5utr_example.bed", package = "Wimtrap"),
                      CDS = system.file("extdata/cds_example.bed", package = "Wimtrap"),
                      Intron = system.file("extdata/intron_example.bed", package = "Wimtrap"),
                      X3UTR = system.file("extdata/x3utr_example.bed", package = "Wimtrap")
                     )
imported_genomic_data.ex <- importGenomicData(biomart = FALSE,
                                              genomic_data = genomic_data.ex,
                                              tss = system.file("extdata/tss_example.bed", package = "Wimtrap"),
                                              tts = system.file("extdata/tts_example.bed", package = "Wimtrap"))

##With automatic download from Biomart of data related to gene structure
genomic_data.ex <- c(CE = system.file("extdata/conserved_elements_example.bed", package = "Wimtrap"),
                     DGF = system.file("extdata/DGF_example.bed", package = "Wimtrap"),
                     DHS = system.file("extdata/DHS_example.bed", package = "Wimtrap")
                     )
imported_genomic_data.ex <- importGenomicData(organism = "Arabidopsis thaliana",
                                              biomart = TRUE,
                                              genomic_data = genomic_data.ex)

RiviereQuentin/Wimtrap documentation built on June 29, 2024, 7:17 p.m.