PrepareAnnotationEnsembl: prepare annotation from ENSEMBL

Description Usage Arguments Details Value Author(s) Examples

Description

prepare the annotation from ENSEMBL through biomaRt.

Usage

1
2
3
4
5
PrepareAnnotationEnsembl(mart, annotation_path, splice_matrix = FALSE,
  dbsnp = NULL, transcript_ids = NULL, COSMIC = FALSE,
  local_cache_path = NULL,
  ensembl_to_UCSC_genome_map = DEFAULT_ENSEMBL_UCSC_GENOME_MAP,
  dbsnp_and_cosmic_only = FALSE, ...)

Arguments

mart

which version of ENSEMBL dataset to use. see useMart from package biomaRt for more detail.

annotation_path

specify a folder to store all the annotations

splice_matrix

whether generate a known exon splice matrix from the annotation; not necessary if you don't want to analyse junction results, default is FALSE.

dbsnp

specify a snp dataset you want to use for the SNP annotation, default is NULL.

transcript_ids

optionally, only retrieve transcript annotation data for the specified set of transcript ids

COSMIC

whether to download COSMIC data, default is FALSE.

local_cache_path

if non-NULL, refers to a directory where previously downloaded resources (like protein coding sequences and COSMIC data) are cached so that the function can be re-run without needing to download identical data again

ensembl_to_UCSC_genome_map

a named list of named lists used to look up the UCSC dbkey for a given biomart; only used for downloading dbSNPs; if DEFAULT_ENSEMBL_UCSC_GENOME_MAP does not contain an up-to-date mapping, pass a new mapping like list("<species>_gene_ensembl" = list("<month>.archive.ensembl.org" = "<ucsc_dbkey>"))

...

additional arguments, currently unused

Details

this function automaticlly prepares all annotation infromation needed in the following analysis.

Value

several .RData file containing annotations needed for following analysis.

Author(s)

Xiaojing Wang

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
ensembl <- biomaRt::useMart("ENSEMBL_MART_ENSEMBL",
                            dataset="hsapiens_gene_ensembl",
                            host="sep2015.archive.ensembl.org")

cache_path <- system.file("extdata", "cache", package="customProDB")
annotation_path <- tempdir()
transcript_ids <- c("ENST00000234420", "ENST00000269305", "ENST00000445888",
                    "ENST00000257430", "ENST00000508376", "ENST00000288602",
                    "ENST00000269571", "ENST00000256078", "ENST00000384871")

PrepareAnnotationEnsembl(mart=ensembl, annotation_path=annotation_path,
    splice_matrix=FALSE, dbsnp=NULL, transcript_ids=transcript_ids,
    COSMIC=FALSE, local_cache_path=cache_path)

## Not run: 
# full annotation tests

test_datasets = c("hsapiens", "mmusculus", "cfamiliaris", "scerevisiae", "ggorilla")
test_releases = c("mar2017", "may2009", "may2012", "may2017")
for (d in 1:length(test_datasets))
  for (r in 1:length(test_releases)) {
    dataset = paste0(test_datasets[d], "_gene_ensembl")
    host = paste0(test_releases[r], ".archive.ensembl.org")
    mart = biomaRt::useMart("ENSEMBL_MART_ENSEMBL", dataset, host)
    PrepareAnnotationEnsembl(mart, annotation_path, splice_matrix=FALSE, dbsnp=NULL, COSMIC=FALSE,
                             local_cache_path=file.path(annotation_path, "cache"))
  }


## End(Not run)

chambm/customProDB documentation built on May 31, 2019, 12:08 p.m.