Description Usage Arguments Value Reference Sequence Region Input Files Location Types Ouput Separators Examples
These method allows to perform fast probe-to-gene annotation using 1) chromosomal location, or 2) alignment files from softwares such "bowtie"
, "bowtie2"
or "gmap"
.
1 2 3 4 5 6 7 8 | annotateByAlignment(file1, file2, alignment.columnsIndex, sepFile2, minScore,
refDownStream, refUpStream, probesetSep, txDb, mapType = "EXONINTRON",
promotorRange = 1500, extendedRange = 2000, orgDb, orgDb_Columns,
sep_intra = ";", sep_inter = "//", verbose = FALSE)
annotateByLocation(x, txDb, mapType, promotorRange = 1500,
extendedRange = 2000, orgDb, orgDb_Columns, sep_intra = ";",
sep_inter = "\\", verbose = FALSE)
|
file1 |
A character vector, the name of the probes's fasta file. See Input Files. |
file2 |
A character vector, the name of the probes-to-reference SAM file. See Input Files. |
alignment.columnsIndex |
A numeric vector, containing the index of the score, probe's name, reference's name and alignment offset in the alignment file |
sepFile2 |
A character vector, the string column separator un |
minScore |
A numeric value, giving the minimum allowed alignment score. |
refDownStream |
A numeric value, giving the number of downstream bp in the reference, see Reference Sequence Region. |
refUpStream |
A numeric value, giving the number of upstream bp in the reference, see Reference Sequence Region. |
probesetSep |
A character vector, indicating the probeset seperator if probes are organised in sets. |
txDb |
A |
mapType |
A character vector representing the probe-to-gene mapping type. This must be one of |
promotorRange |
A integer vector, giving the window size for the genes' promotor site in bp. Default is |
extendedRange |
A integer vector, giving the window size for the genes' extended site in bp. Default is |
orgDb |
A |
orgDb_Columns |
A character vector (optional), giving which columns to extract from the |
sep_intra |
A character vector, giving the separator character for gene information, see Ouput Separators. |
sep_inter |
A character vector, giving the separator character between genes, see Ouput Separators. |
verbose |
A logical value, indicating if messages should be printed. Default is |
x |
A |
The method annotateByAlignment
returns a data.frame
object, with one record (or row) for each probes given in x
.
With the default organism database, this data.frame
contains the following information:
Column | Comment |
probe_name | xxxx |
entrezid | xxxx |
chr | xxxx |
strand | xxxx |
loctype | xxxx |
gene_end | xxxx |
gene_start | xxxx |
gene_symbol | xxxx |
gene_alias | xxxx |
gene_name | xxxx |
If the user supplies its own organism database (orgDb
and orgDb_Columns
), the function will return a equivalent data.frame
as above, with the columns gene_symbol
, gene_symbol
and gene_symbol
replaced by the ones provided in orgDb_Columns
.
it corresponds to the gene reference format used for alignment and is controled with the mapType
argument. It determines the loctype
in the annotation. There are three types of allowed mapType
:
"NO_EXONINTRON"
: if the reference sequence contains one sequence per gene.
The available loctype
are: "gene"
,"promotor"
,"extended"
,"intragenic"
.
"EXONINTRON"
: if the reference contains one sequence per transcript, including introns.
The available loctype
are: "gene"
,"intron"
,"exon"
,"promotor"
,"extended"
,"intragenic"
.
"EXON"
: if the reference contains one sequence per transcript, without introns.
The available "gene"
,"exon"
,"promotor"
,"extended"
,"intragenic"
.
The upstream and downstream values that are retrieved in the reference are controled with the refUpStream
and refDownStream
parameters (in bp).
this file's name is given by the file1
argument. It is used to retrieve all of the platform's probe names.
#'
when the probes in the platform are arranged in probesets, one can use the probesetSep
to define the probesets seperator string.
For example, using Affymetrix's XXX platform, set probesetSep="at."
.
this file's name is given by the file2
argument. Those outputs must have columns the alignment score, probe's name, reference's name and alignment offset (see Alignment format below).
the alignment format must be known to this function to get the alignment infomration (score, probe, ref ,offset). The default input is the SAM format (see specifications at https://samtools.github.io/hts-specs/SAMv1.pdf, however it can be achieved manualy using the alignment.columnsIndex
and the sepFile2
arguments.
alignment.columnsIndex
and the sepFile2
allows user to enter specific alignment ouput format. The alignment.columnsIndex
must indicate the columns of score, probe's name, reference's name and alignment offset. The column separator is given with argument sepFile2
.
The location types (column loctype
) are pre-defined regions that describe gene's region to which the probe match to. There are six types or loctype
(shown in table below).
loctype | illustration |
"gene" | .............########################............. |
"intron" | ................***.....******..**................ |
"exon" | .............===...=====......==..===............. |
"promotor" | ..........<+++++>................................. |
"extended" | .......<~~~~>........................<~~~~>....... |
"intragenic" | ------>....................................<------
|
they can be adjusted using the promotorRange
and extendedRange
parameters. The promotor's range is set at +/- promotorRange
bp from the gene's start location. The extended's ranges are located at both ends of the gene, extending the gene region by extendedRange
bp.
To exclude the "promotor"
and/or "extended"
regions for the annotation, set promotorRange=0
and/or extendedRange=0
.
sep_intra
controls the columns' elements concatenation in a unique reference (i.e. genes). If sep_intra=";"
then gene items that have multiple entries are concatenated with ";"
. For example:
EGFR gene has six other symbols (ERBB, HER1, mENA, ERBB1, PIG61 and NISBD2), the "alias"
column will be:
EGFR;ERBB;HER1;mENA;ERBB1;PIG61;NISBD2
HOXA10 gene has four other symbols (HOX1, HOX1.8, HOX1H and PL), the "alias"
column will be:
HOXA10;HOX1;HOX1.8;HOX1H;PL
sep_inter
controls the columns' elements concatenation when a probe is mapped to multiple references (i.e. genes).
For example assume that a probe is mapped both to HOXA10 and EGFR, then all columns containing gene information are concatenated with sep_inter="//"
. Here:
symbol alias
HOXA10//EGFR HOXA10;HOX1;HOX1.8;HOX1H;PL//EGFR;ERBB;HER1;mENA;ERBB1;PIG61;NISBD2
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 | ## .. todo
## Not run:
## Example of 3 coordinates of the MGMT gene on chromosome 10
## - coordinates: chr10:27132612-2713562
## - assembly: hg19 (default in annotateByLocation)
start = c(131263500, 131264960, 131265460)
probeID = sprintf("probe_%d",1:3)
## Using x in GRanges format
gr = GRanges(seqnames = "chr10",
strand = "+",
ranges = IRanges(start = start,
width = 20),
ID = probeID)
annot_gr = annotateByLocation(x = gr, mapType = "EXON")
df = data.frame(chr = "chr10",
strand = "+",
start = start,
end = start+20,
ID = probeID)
annot_df = annotateByLocation(x = df, mapType = "EXON")
## Check if both results are the same
all.equal(annot_gr, annot_df)
print(annot_gr)
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.