Description Usage Arguments Details Value Author(s) Examples
Unified interface to define a genomic region.
1 2 3 4 5 6 7 8 9 |
chrom |
Character specifying chromosome |
pos_start |
Integer start position on chromosome |
pos_end |
Integer end position on chromosome |
pos |
Integer position on chromosome |
hgncid |
HGNC gene identifier |
ensemblid |
ENSEMBL gene identifier |
rs |
dbSNP rs identifier |
surround |
Distance around entity to include in region |
pos_ge |
Position greater-or-equal required |
pos_le |
Position less-or-equal required |
pos_end_ge |
End position greater-or-equal required |
pos_start_le |
Start position less-or-equal required |
pos_start_ge |
Start position greater-or-equal required |
pos_end_le |
End position less-or-equal required |
tablename |
Database table name |
dbc |
Database connection |
The gtxregion()
function provides a unified interface for other
functions to define a genomic region (or potentially for a user to
invoke directly). For any valid combination of its optional
arguments, it returns genomic coordinates (chromosome, start and end
positions) as described below, using the database connection
dbc
to resolve any queries (such as the coordinates of a named
gene).
When accessing this functionality indirectly via higher level
functions (such as regionplot()
and
coloc()
), the functionality should be almost completely
intuitive for most users, and if necessary can be learned by example
from the manual pages and vignettes for those higher level functions.
It suffices to add that the optional arguments are used according to a
priority order, which is exactly the order of arguments in the
function definition. For example if chrom
, pos_start
,
pos_end
and hgnc
are all provided, hgnc
has lower
priority and is ignored. Similarly if hgnc
and pos
are
provided, pos
has lower priority and is ignored.
It is an intended design feature that pos
and rs
are
lowest in the priority order. When used in conjunction with higher
priority arguments such as hgnc
, a pos
or rs
argument can be used without affecting the genomic region
specified, which then allows a function that wraps gtxregion()
to use pos
or rs
for secondary purposes, such as to
highlight a specific position or variant in a visual display. Thus,
regionplot(..., pos = 1234567, surround = 500000)
selects a
500kb region around position 1234567 and visually highlights any
variant present at position 1234567, and regionplot(..., hgnc =
'ABC123', surround = 10000, pos = 1234567)
selects a 10kb region
around the ABC123 gene and visually highlights any variant present at
position 1234567.
The remainder of this manual page is more technical documentation,
intended for programmers writing new high level functions that will
work alongside regionplot()
and coloc()
,
and should be read in combination with the source code.
The gtxregion()
function resolves its arguments to genomic
coordinates as follows:
If the arguments chrom
, pos_start
and pos_end
are
all provided, these are checked for validity and used to directly
specify the return value.
Otherwise, if the argument hgnc
is provided, TABLE genes
is queried (using dbc
and gtxwhere
) and a region
spanning the gene(s) plus surround
ing distance is returned.
Otherwise, if the argument ensg
(integer) is provided,
TABLE genes
is similarly queried.
Otherwise, if the arguments chrom
and pos
are both
provided, these are checked for validity and used plus
surround
ing distance to directly specify the return value.
Otherwise, if the argument rs
is provided, TABLE sites
(sites_by_rs
) is queried (using gtxwhere
) and a region
plus surround
ing distance is returned.
The methods just described are implemented using if ... else if
... else if ...
logic, so for example if a hgnc
argument is
provided then any ensg
argument is ignored, etc.
The gtxwhere
function provides a standardized and sanitized way
to dynamically construct part of a SQL WHERE statement. This is best
illustrated by the examples below. When more than one argument value
is given, either as multiple values for a single argument, or for more
that one argument, the following logic seems most useful: Multiple
values for a single argument are combined or-wise, and multiple
arguments are combined and-wise.
To use gtxwhere
to select chromosome segments (such as genes or
other entities, recombination rate segments, etc) that wholly or
partially overlap a query region, use pos_end_ge=query_start
and pos_start_le=query_end
. To select only chromosome
segments that wholly overlap, instead use
pos_start_ge=query_start
and pos_end_le=query_end
.
For identifiers that are represented (for efficiency) as integers in
database tables but as strings in “user space”, gtxwhere
is the layer at which string-to-integer checking and conversion should
occur.
gtxregion
returns a named list with elements ‘chrom’
(character), ‘pos_start’ (integer) and ‘pos_end’
(integer).
gtxwhere
returns a character string suitable for inclusion
after the WHERE clause in a SQL statement.
Toby Johnson Toby.x.Johnson@gsk.com
1 2 3 4 5 6 7 8 9 10 11 12 | ## Not run:
gtxregion(chrom = 1, pos_start = 109616403, pos_end = 109623689)
# dies without an open ODBC connection
## End(Not run)
gtxwhere(rs = 'rs599839')
gtxwhere(chrom = 1, pos = c(109616403, 109623689))
gtxwhere(chrom = 1, pos_end_ge = 109616403, pos_start_le = 109623689)
## Not run:
gtxwhere()
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.