prepare_anno | R Documentation |
The goal of this function is to download the reference fasta file for a specific release of Ensembl or Gencode. The reference is then cleaned. We keep only the transcript id and we remove the transcript version by default. It is also possible to add ERCC92 sequences. Reference files without the alternative chromosomes and only with the protein coding are also generated.
prepare_anno( org, db = "Ensembl", release = NA, ERCC92 = FALSE, force_download = FALSE, gtf = FALSE, outdir = "." )
org |
The organism name. Currently accepted: * Homo sapiens (Ensembl and Gencode) * Mus musculus (Ensembl and Gencode) * Macaca mulatta (Ensembl only) * Rattus norvegicus (Ensembl only) * Bos taurus (Ensembl only) |
db |
The database to use: Ensembl or Gencode. Default: "Ensembl" |
release |
The version of the database to use. Must be greater than 100 for Ensembl, 35 for Gencode Homo sapiens and 25 for Gencode Mus musculus. Default: NA |
ERCC92 |
Add ERCC92 sequence to reference and to anno? Default: FALSE |
force_download |
Re-download raw reference if it is already present? Default: FALSE |
gtf |
Download the annotation corresponding to the fasta in gtf format? Default: FALSE |
outdir |
Directory in which to save the files. Default : "." |
#' After calling this function, a <prefix>.raw_ref.fa.gz file will be downloaded (if not already present) to the current working directory that corresponds to the raw reference file. There will also be a clean version, without alternative chromosomes in the format <prefix>.no_alt_chr.fa.gz. A <prefix>.protein_coding.fa.gz file is also generated, containing only the protein_coding genes. Finally, for all 3 fa.gz files, a <prefix>.csv file is created. The csv file contains the annotation formated correctly for the rnaseq packages. Finally, a <prefix>.info file is created. This file contains metadata about every file and the parameters used.
The <prefix>.info file contains the following columns: * prefix: The prefix of the file. Must match filename (i.e.: prefix of Hs.Gencode38.csv is Hs.Gencode38). * org: The organism name (i.e.: Homo sapiens) * db: Database where the annotation was downloaded. * release: The version of the database. * ERCC92: The value of the ERCC92 argument. * anno_pkg_version: The anno package version. * download_date: The date the annotation was downloaded. * download_url: The URL that was used to download the annotation. * A md5sum for every file generated, one column per file.
Returns a list
including every information in the
<prefix>.info file.
## Not run: prepare_anno("Hs.Ensembl103", org = "Homo sapiens", db = "Ensembl", release = 103) ## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.