View source: R/referencePrepare.R
referencePrepare | R Documentation |
Creates reference file for IntEREst functions, e.g. interest()
. The
function uses functions of biomaRt
library.
referencePrepare( outFileTranscriptsAnnotation="",
annotateGeneIds=TRUE,
u12IntronsChr=c(), u12IntronsBeg=c(), u12IntronsEnd=c(),
u12IntronsRef, collapseExons=TRUE, sourceBuild="UCSC",
ucscGenome="hg19", ucscTableName="knownGene",
ucscUrl="http://genome-euro.ucsc.edu/cgi-bin/",
biomart="ENSEMBL_MART_ENSEMBL",
biomartDataset="hsapiens_gene_ensembl",
biomartTranscriptIds=NULL, biomartExtraFilters=NULL,
biomartIdPrefix="ensembl_", biomartHost="www.ensembl.org",
biomartPort=80, circSeqs="", miRBaseBuild=NA, taxonomyId=NA,
filePath="", fileFormat=c("auto", "gff3", "gtf"), fileDatSrc=NA,
fileOrganism=NA, fileChrInf=NULL,
fileDbXrefTag=c(), addCollapsedTranscripts=TRUE,
ignore.strand=FALSE )
outFileTranscriptsAnnotation |
If defined outputs transcripts annotations. |
annotateGeneIds |
Wether annotate and add the gene ids information. |
collapseExons |
Whether collapse (i.e. reduce) the exonic regions. TRUE by default. |
sourceBuild |
The source to use to build the reference data, |
ucscGenome |
The genome to use. |
ucscTableName |
The UCSC table name to use. See |
ucscUrl |
The UCSC URL address. See |
u12IntronsChr |
A vector of character strings that includes chromsomal locations of the U12
type introns. If defined together with |
u12IntronsBeg |
A vector of numbers that defines the begin (or start) coordinates of the u12-type introns. |
u12IntronsEnd |
A vector of numbers that defines the end coordinates of the u12-type introns. |
u12IntronsRef |
A GRanges object that includes the coordinates of the U12 type introns. If defined, it would be used to annotate the U12-type introns. |
biomart |
BioMart database name. See |
biomartDataset |
BioMart dataset name; default is "hsapiens_gene_ensembl". See |
biomartTranscriptIds |
optional parameter to only retrieve transcript annotation results for a defined
set of transcript ids. See |
biomartExtraFilters |
A list of names; i.e. additional filters to use in the BioMart query. See
|
biomartIdPrefix |
A list of names; i.e. additional filters to use in the BioMart query. See
|
biomartHost |
Host to connect to; the default is "www.ensembl.org". For older versions of the GRCH you can provide the archive websites, e.g. for GRCH37 you can use "grch37.ensembl.org". |
biomartPort |
The port to use in the HTTP communication with the host. Default is 80. |
circSeqs |
A character vector that includes chromosomes that should be marked as circular.
See |
miRBaseBuild |
Set appropriate build Information from mirbase.db to use for microRNAs
(default=NA). See |
taxonomyId |
This parameter can be used to provide taxonomy Ids. It is set to NA by default.
You can check the taxonomy Ids with the |
filePath |
Character string i.e. the path to file. Used if |
fileFormat |
The format of the input file. |
fileDatSrc |
Character string describing the source of the data file. Used if
|
fileOrganism |
The genus and species name of the organism. Used if |
fileChrInf |
Dataframe that includes information about the chromosome. The first column
represents the chromosome name and the second column is the length of the
chromosome. Used if |
fileDbXrefTag |
A vector of chracater strings which if defined it would be used as feature
names. Used if |
addCollapsedTranscripts |
Whether add a column that includes the collapsed transcripts information. Used
if |
ignore.strand |
Whether consider the strands in the reference. If set |
Data frame that includes the coordinates and annotations of the introns and exons of the transcripts, i.e. the reference.
Ali Oghabian
# Build test gff3 data
tmpGen<- u12[u12[,"ens_trans_id"]=="ENST00000413811",]
tmpEx<-tmpGen[tmpGen[,"int_ex"]=="exon",]
exonDat<- cbind(tmpEx[,3], ".",
tmpEx[,c(7,4,5)], ".", tmpEx[,6], ".",paste("ID=exon",
tmpEx[,11], "; Parent=ENST00000413811", sep="") )
trDat<- c(tmpEx[1,3], ".", "mRNA", as.numeric(min(tmpEx[,4])),
as.numeric(max(tmpEx[,5])), ".", tmpEx[1,6], ".",
"ID=ENST00000413811")
outDir<- file.path(tempdir(),"tmpFolder")
dir.create(outDir)
outDir<- normalizePath(outDir)
gff3File=paste(outDir, "gffFile.gff", sep="/")
cat("##gff-version 3\n",file=gff3File, append=FALSE)
cat(paste(paste(trDat, collapse="\t"),"\n", sep=""),
file=gff3File, append=TRUE)
write.table(exonDat, gff3File,
row.names=FALSE, col.names=FALSE,
sep='\t', quote=FALSE, append=TRUE)
# Selecting U12 introns info from 'u12' data
u12Int<-u12[u12$int_ex=="intron"&u12$int_type=="U12",]
# Test the function
refseqRef<- referencePrepare (sourceBuild="file",
filePath=gff3File, u12IntronsChr=u12Int[,"chr"],
u12IntronsBeg=u12Int[,"begin"],
u12IntronsEnd=u12Int[,"end"], collapseExons=TRUE,
fileFormat="gff3", annotateGeneIds=FALSE)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.