referencePrepare: Creates reference file

Description Usage Arguments Value Author(s) Examples

View source: R/referencePrepare.R

Description

Creates reference file for IntEREst functions, e.g. interest(). The function uses functions of biomaRt library.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
referencePrepare( outFileTranscriptsAnnotation="",
	annotateGeneIds=TRUE, 
	u12IntronsChr=c(), u12IntronsBeg=c(), u12IntronsEnd=c(),
	u12IntronsRef,	collapseExons=TRUE, sourceBuild="UCSC", 
	ucscGenome="hg19", ucscTableName="knownGene",
	ucscUrl="http://genome-euro.ucsc.edu/cgi-bin/",
	biomart="ENSEMBL_MART_ENSEMBL",
	biomartDataset="hsapiens_gene_ensembl",
	biomartTranscriptIds=NULL, biomartExtraFilters=NULL, 
	biomartIdPrefix="ensembl_",	biomartHost="www.ensembl.org",
	biomartPort=80,	circSeqs="", miRBaseBuild=NA, taxonomyId=NA,
	filePath="", fileFormat=c("auto", "gff3", "gtf"), fileDatSrc=NA,
	fileOrganism=NA, fileChrInf=NULL, 
	fileDbXrefTag=c(), addCollapsedTranscripts=TRUE, 
	ignore.strand=FALSE )

Arguments

outFileTranscriptsAnnotation

If defined outputs transcripts annotations.

annotateGeneIds

Wether annotate and add the gene ids information.

collapseExons

Whether collapse (i.e. reduce) the exonic regions. TRUE by default.

sourceBuild

The source to use to build the reference data, "UCSC", "biomaRt", and "file" (for GFF3 or GTF files) are supported.

ucscGenome

The genome to use. "hg19" is the default. See genome parameter of makeTxDbFromUCSC function of GenomicFeatures library for more information.

ucscTableName

The UCSC table name to use. See tablename parameter of makeTxDbFromUCSC function of GenomicFeatures library for more information.

ucscUrl

The UCSC URL address. See url parameter of makeTxDbFromUCSC function of GenomicFeatures library for more information.

u12IntronsChr

A vector of character strings that includes chromsomal locations of the U12 type introns. If defined together with u12IntronsBeg and u12IntronsBeg, they would be used to annotate the U12-type introns.

u12IntronsBeg

A vector of numbers that defines the begin (or start) coordinates of the u12-type introns.

u12IntronsEnd

A vector of numbers that defines the end coordinates of the u12-type introns.

u12IntronsRef

A GRanges object that includes the coordinates of the U12 type introns. If defined, it would be used to annotate the U12-type introns.

biomart

BioMart database name. See biomart parameter of makeTxDbFromBiomart function of GenomicFeatures library for more information.

biomartDataset

BioMart dataset name; default is "hsapiens_gene_ensembl". See dataset parameter of makeTxDbFromBiomart function of GenomicFeatures library for more information.

biomartTranscriptIds

optional parameter to only retrieve transcript annotation results for a defined set of transcript ids. See transcript_ids parameter of makeTxDbFromBiomart function of GenomicFeatures library for more information.

biomartExtraFilters

A list of names; i.e. additional filters to use in the BioMart query. See filters parameter of makeTxDbFromBiomart function of GenomicFeatures library for more information.

biomartIdPrefix

A list of names; i.e. additional filters to use in the BioMart query. See id_prefix parameter of makeTxDbFromBiomart function of GenomicFeatures library for more information.

biomartHost

Host to connect to; the default is "www.ensembl.org". For older versions of the GRCH you can provide the archive websites, e.g. for GRCH37 you can use "grch37.ensembl.org".

biomartPort

The port to use in the HTTP communication with the host. Default is 80.

circSeqs

A character vector that includes chromosomes that should be marked as circular. See circ_seqs parameter of makeTxDbFromBiomart and makeTxDbFromUCSC functions of GenomicFeatures library for more information.

miRBaseBuild

Set appropriate build Information from mirbase.db to use for microRNAs (default=NA). See miRBaseBuild parameter of makeTxDbFromBiomart and makeTxDbFromUCSC functions of GenomicFeatures library for more information.

taxonomyId

This parameter can be used to provide taxonomy Ids. It is set to NA by default. You can check the taxonomy Ids with the available.species() function in GenomeInfoDb package. For more information see taxonomyId parameter of makeTxDbFromBiomart and makeTxDbFromUCSC functions of GenomicFeatures library.

filePath

Character string i.e. the path to file. Used if sourceBuild is "file".

fileFormat

The format of the input file. "auto", "gff3" and "gtf" is supported.

fileDatSrc

Character string describing the source of the data file. Used if sourceBuild is "file".

fileOrganism

The genus and species name of the organism. Used if sourceBuild is "file".

fileChrInf

Dataframe that includes information about the chromosome. The first column represents the chromosome name and the second column is the length of the chromosome. Used if sourceBuild is "file".

fileDbXrefTag

A vector of chracater strings which if defined it would be used as feature names. Used if sourceBuild is "file".

addCollapsedTranscripts

Whether add a column that includes the collapsed transcripts information. Used if collapseExons is TRUE.

ignore.strand

Whether consider the strands in the reference. If set TURE the strands would be ingnored.

Value

Data frame that includes the coordinates and annotations of the introns and exons of the transcripts, i.e. the reference.

Author(s)

Ali Oghabian

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
	# Build test gff3 data
	tmpGen<- u12[u12[,"ens_trans_id"]=="ENST00000413811",]
	tmpEx<-tmpGen[tmpGen[,"int_ex"]=="exon",]
	exonDat<- cbind(tmpEx[,3], ".", 
		tmpEx[,c(7,4,5)], ".", tmpEx[,6], ".",paste("ID=exon", 
		tmpEx[,11], "; Parent=ENST00000413811", sep="") )
	trDat<- c(tmpEx[1,3], ".", "mRNA", as.numeric(min(tmpEx[,4])), 
		as.numeric(max(tmpEx[,5])), ".", tmpEx[1,6], ".", 
		"ID=ENST00000413811")

	outDir<- file.path(tempdir(),"tmpFolder")
	dir.create(outDir)
	outDir<- normalizePath(outDir)

	gff3File=paste(outDir, "gffFile.gff", sep="/")

	cat("##gff-version 3\n",file=gff3File, append=FALSE)
	cat(paste(paste(trDat, collapse="\t"),"\n", sep=""),
		file=gff3File, append=TRUE)

	write.table(exonDat, gff3File,
		row.names=FALSE, col.names=FALSE,
		sep='\t', quote=FALSE, append=TRUE)	

	# Selecting U12 introns info from 'u12' data
	u12Int<-u12[u12$int_ex=="intron"&u12$int_type=="U12",]

	# Test the function
	refseqRef<- referencePrepare (sourceBuild="file", 
		filePath=gff3File, u12IntronsChr=u12Int[,"chr"], 
		u12IntronsBeg=u12Int[,"begin"], 
		u12IntronsEnd=u12Int[,"end"], collapseExons=TRUE, 
		fileFormat="gff3", annotateGeneIds=FALSE)

IntEREst documentation built on Nov. 8, 2020, 8:05 p.m.