PrepareAnnotationRefseq: prepare annotation for Refseq

Description Usage Arguments Value Author(s) Examples

Description

prepare the annotation for Refseq through UCSC table browser.

Usage

1
2
3
PrepareAnnotationRefseq(genome = "hg19", CDSfasta, pepfasta, annotation_path,
  dbsnp = NULL, transcript_ids = NULL, splice_matrix = FALSE,
  COSMIC = FALSE, local_cache_path = NULL, ...)

Arguments

genome

specify the UCSC DB identifier (e.g. "hg19")

CDSfasta

path to the fasta file of coding sequence.

pepfasta

path to the fasta file of protein sequence, check 'introduction' for more detail.

annotation_path

specify a folder to store all the annotations.

dbsnp

specify a snp dataset to be used for the SNP annotation, default is NULL. (e.g. "snp135")

transcript_ids

optionally, only retrieve transcript annotation data for the specified set of transcript ids. Default is NULL.

splice_matrix

whether generate a known exon splice matrix from the annotation. this is not necessary if you don't want to analyse junction results, default is FALSE.

COSMIC

whether to download COSMIC data, default is FALSE.

local_cache_path

if non-NULL, refers to a directory where previously downloaded resources (like protein coding sequences and COSMIC data) are cached so that the function can be re-run without needing to download identical data again

...

additional arguments

Value

several .RData file containing annotations needed for further analysis.

Author(s)

Xiaojing Wang

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
transcript_ids <- c("NM_001126112", "NM_033360", "NR_073499", "NM_004448",
                    "NM_000179", "NR_029605", "NM_004333", "NM_001127511")
pepfasta <- system.file("extdata", "hg19/hg19_protein.fasta", package="customProDB")
CDSfasta <- system.file("extdata", "hg19/hg19_coding.fasta", package="customProDB")
cache_path <- system.file("extdata", "cache", package="customProDB")
annotation_path <- tempdir()
PrepareAnnotationRefseq(genome='hg19', CDSfasta, pepfasta, annotation_path,
            dbsnp=NULL, transcript_ids=transcript_ids,
            splice_matrix=FALSE, COSMIC=FALSE, local_cache_path=cache_path)

## Not run: 
dbkey = "hg38"
tempLocalCache = tempdir()
refseqTrack = ifelse(dbkey=="hg38", "refSeqComposite", "refGene")
codingFastaFilepath = paste0(tempLocalCache, "/", dbkey, ".cds.fa")
proteinFastaFilepath = paste0(tempLocalCache, "/", dbkey, ".protein.fa")

options(timeout=3600)
if (!file.exists(codingFastaFilepath)) {
  cat(paste("Downloading coding FASTA from:", getCodingFastaUrlFromUCSC(dbkey), "\n"))
  download.file(getCodingFastaUrlFromUCSC(dbkey), codingFastaFilepath, quiet=T, mode='wb')
}

if (!file.exists(proteinFastaFilepath)) {
  cat(paste("Downloading protein FASTA from:", getProteinFastaUrlFromUCSC(dbkey), "\n"))
  download.file(getProteinFastaUrlFromUCSC(dbkey), proteinFastaFilepath, quiet=T, mode='wb')
}

cat(paste("Preparing Refseq annotation files\n"))
customProDB::PrepareAnnotationRefseq(dbkey, codingFastaFilepath, proteinFastaFilepath,
                                     annotation_path=".",
                                     dbsnp="snp146", COSMIC=FALSE,
                                     local_cache_path=tempLocalCache)
                                     

## End(Not run)

chambm/customProDB documentation built on May 31, 2019, 12:08 p.m.