View source: R/extract_upstream_promotor_seqs.R
extract_upstream_promotor_seqs | R Documentation |
Given a genome assembly file and an corresponding annotation file users can retrieve all upstream promotor sequences of all genes from a genome.
extract_upstream_promotor_seqs(
organism,
genome_file,
annotation_file,
annotation_format,
file_name = NULL,
promotor_width,
replaceUnstranded = "+"
)
organism |
a character string specifying the scientific name of the organism. |
genome_file |
file path to the genome assembly file. |
annotation_file |
file path to the annotation file of the genome assembly
in |
annotation_format |
format of the annotation file. Options are:
|
file_name |
file path to the output file storing the promotor sequences. |
promotor_width |
width of upstream promotors. This is - |
replaceUnstranded |
logical value indicating whether or not unstranded sequences shall receive a default strand. Default is |
This function extracts genomic sequences of a specified promotor_width
upstream of the transcription start sites of all genes annotated in the corresponding
annotation_file
file. The promotor sequenes are then
Hajk-Georg Drost
## Not run:
# download genome assembly of Arabidopsis lyrata
Aly_genome <- biomartr::getGenome(db = "refseq",
organism = "Arabidopsis lyrata",
path = file.path("refseq", "genome"),
gunzip = TRUE)
# download annotation file of genome assembly of Arabidopsis lyrata
Aly_gff <- biomartr::getGFF(db = "refseq",
organism = "Arabidopsis lyrata",
path = file.path("refseq", "annotation"),
gunzip = TRUE)
# retrieve upstream promotor sequences of length 1000bp
promotor_seqs <- extract_upstream_promotor_seqs(
organism = "Arabidopsis lyrata",
genome_file = Aly_genome,
annotation_file = Aly_gff,
annotation_format = "gff",
promotor_width = 1000)
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.