carepat | R Documentation |
Predicts the location of transcription factor binding sites (=cis-acting regulatory elements) in various conditions
for Arabidopsis thaliana, Solanum lycopersicum, Oryza sativa and Zea mays. The function integrates 7 pre-built general models obtained based on a more
or less extended set of genomic data and trained from different organisms/conditions. These models almost all integrate the degree of opening
of the chromating (DHS: DNAseI hypersensitive sites) and results of digital genomic footprinting (DGF: digital genomic footprints) in
the conditions that can be studied using carepat
. These represent genomic data with high potenital of prodectivity (see details).
carepat(
organism = c("Arabidopsis thaliana", "Solanum lycopersicum", "Oryza sativa",
"Zea mays"),
condition = c("seedlings", "flowers", "roots", "roots_non_hairs", "seed_coats",
"seedlings_dark7d", "seedlings_dark7dLD24h", "seedlings_dark7dlight3h",
"seedlings_dark7dlight30min", "seedlings_heatshock", "ripening_fruits",
"immature_fruits"),
TFnames = NULL,
pfm = NULL,
show_annotations = FALSE,
score_threshold = 0.5
)
organism |
"Arabidopsis thaliana", "Solanum lycopersicum", "Oryza sativa" or "Zea mays" |
condition |
Character indicating the studied condition. For Arabidopsis thaliana: "seedlings", "flowers", "roots", "roots_non_hairs","seed_coats", "seedlings_dark7d", "seedlings_dark7dLD24h","seedlings_dark7dlight3h", "seedlings_dark7dlight30min" or "seedlings_heatshock"". For Solanum lycopersicum: "ripening_fruits" or "immaturefruits". For Oryza sativa, "seedlings" or "roots". For Zea mays: "seedlings". |
TFnames |
Character vector setting the name(s) of the studied transcription factors. These names have to follow the
AGI (Arabidopsis)/Solyc (Tomato) nomenclature to allow the retrieval of the motis from PlantTFDB
database.Otherwise, if you input the motifs from a local file through |
pfm |
Path to a file including the position frequency or weight matrices (PFMs or PWMs) of the motifs recognized
by the considered transcription factors (training and/or studied TFs). This file can be in different formats, determined based
on the file extension: raw pfm (".pfm"), jaspar (".jaspar"), meme (".meme"), transfac (".transfac"), homer (".motif") or
cis-bp (".txt"). |
show_annotations |
A logical. Default = |
score_threshold |
A numeric (comprised between 0 and 1). Sets the minimum prediction score output by the
|
The following table details, for each organism-condition that can be studied using carepat
, the model
that is considered: from which training organism-condition it has been obtained and which genomic features.
studied = training organism | studied condition | training condition | genomic features |
Arabidopsis thaliana | whole seedlings ("seedlings") | whole seedlings ("seedlings") | Layers 1, 2, 3, 4 + Full layer 5 |
Arabidopsis thaliana | flowers in stages 4-5 ("flowers") | flowers in stages 4-5 ("flowers") | Layers 1, 2, 3, 4 + DHS, Cme |
Arabidopsis thaliana | seedling roots ("roots") | whole seedlings ("seedlings") | Layers 1, 2, 3, 4 + DHS |
Arabidopsis thaliana | non-hair part of seedling roots ("roots_non_hair") | whole seedlings ("seedlings") | Layers 1, 2, 3, 4 + DHS |
Arabidopsis thaliana | seed coats, 4 days after anthesis ("seedlings_coats") | whole seedlings ("seedlings") | Layers 1, 2, 3, 4 + DHS |
Arabidopsis thaliana | heat-shocked seedlings ("seedlings_heatshock) | whole seedlings ("seedlings") | Layers 1, 2, 3, 4 + DHS |
Arabidopsis thaliana | dark-grown seedlings ("seedlings_dark7d") | whole seedlings ("seedlings") | Layers 1, 2, 3, 4 + DHS |
Arabidopsis thaliana | dark-grown seedlings exposed to 30 min of light ("seedlings_dark7d30min") | whole seedlings ("seedlings") | Layers 1, 2, 3, 4 + DHS |
Arabidopsis thaliana | dark-grown seedlings exposed to 3h of light ("seedlings_dark7d3h") | whole seedlings ("seedlings") | Layers 1, 2, 3, 4 + DHS |
Arabidopsis thaliana | dark-grown seedlings exposed to a long day cycle ("seedlings_dark7dLD24h") | whole seedlings ("seedlings") | Layers 1, 2, 3, 4 + DHS |
Solanum lycopersicum | ripening fruits ("ripening_fruits") | ripening fruits ("ripening_fruits") | Layers 1, 2, 3, 4 + DHS, Cme, H3K27me3 |
Solanum lycopersicum | immature fruits ("immature_fruits") | ripening fruits ("ripening_fruits") | Layers 1, 2, 3 + DHS, Cme, H3K27me3 |
Oryza sativa | whole seedlings ("seedlings") | whole seedlings ("seedlings") | Layers 1, 2, 3, 4 + DHS, Cme, H3K36me3, H3K27ac, H3K27me3, H3K4me3, H3K9ac, H4K12ac |
Oryza sativa | seedling roots ("roots") | whole seedlings ("seedlings") | Layers 1, 2, 3, 4 + DHS, Cme, H3K36me3, H3K27ac, H3K27me3, H3K4me3, H3K9ac, H4K12ac |
Zea mays | whole seedlings ("seedlings") | whole seedlings ("seedlings") | Layers 1, 2, 3, 4 + DHS, Cme |
The different layers of genomic features are composed of the following:
Layer 1: results of pattern-matching (log10 p-value of the score and local density of matches)
Layer 2: phastcons-scored conserved elements (for Arabidopsis and the tomato) and conserved non-coding sequences (for Arabiopsis only)
Layer 3: position on the gene (promoter, proximal promoter, 5'untranslated region, coding sequence, intron, 3'untranslated region, downstream region, distance to the transcription start and termination site of the gene)
Layer 4: local signals of digital footprints
Layer 5: local signals of Chromatin state, Cytosine methylation (Cme), Histone 2A.Z positioning (H2AZ), DNA looping (Dloop), Nucleosomes positioning (Nuc), Histone2B monoubiquitination (H2BuB), Monomethylation on lysine 4 of the histone 3 (H3K4me1), Dimethylation on lysine 4 of the histone 3 (H3K4me2), Trimethylation on lysine 4 of the histone 4 (H3K4me3), Dimethylation on lysine 9 of the histone 3 (H3K9me2), Monomethylation on lysine 27 of the histone 3 (H3K27me1), Trimethylation on lysine 27 of the histone 3 (H3K27me3), Trimethylation on lysine 36 of the histone 4 (H3K36me3), Acetylation on lysine 9 of histone 3 (H3K9ac), Acetylation on lysine 14 of histone 3 (H3K14ac), Acetylation on lysine 18 of histone 3 (H3K18ac), Acetylation on lysine 27 of histone 3 (H3K27ac), Acetylation on lysine 56 of histone 3 (H3K56ac), Phosphorylation on tyrosine 3 of histone 3 (H3T3ph), Acetylation on lysine 5 of histone 4 (H4K5ac), Acetylation on lysine 8 of histone 4 (H4K8ac), Acetylation on lysine 12 of histone 4 (H4K12ac), Acetylation on lysine 16 of histone 4 (H4K16ac).
The source of the data used to train the models and, if applicable, to transfer them the studied conditions are described in the file "Sources.ods" on the "RivereQuentin/carepat" repository.
A data.table
listing the predicted binding sites. The 'TF
' column annotates the potential binding sites
with their cognate transcription factor. Additionally, the data.table
describes, for the potential
binding sites, the chromosomic coordinates, the closest transcript (relatively to the transcript start site) and the prediction score.
Optionally, the data.table
might also include the genomic features used to make the predictions.
NB: The chromosomic coordinates are expressed according to the following assemblies: TAIR10 (Arabidopsis thaliana),
SL3.0 (Solanum lycopersicum), IRGSP-1.0 (Oryza sativa) and Zm-B73-REFERENCE-NAM-5.0 (Zea mays).
plotPredictions()
to vizualize the results for a given potential target gene.
#Predictions of the binding sites of "AT2G46830" in flowers of Arabidopsis
CCA1predictions.flowers <- carepat(organism = "Arabidopsis thaliana",
condition = "flowers",
TFnames = "AT2G46830")
#Predictions of the binding sites of "Solyc00g024680.1" in immature fruits of tomato
DOF24predictions.immature <- carepat(organism = "Solanum lycopersicum",
condition = "immature_fruits",
TFnames = "Solyc00g024680.1")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.