Description Usage Arguments Details Value Author(s) Examples
Unified interface to define a genomic region.
1 2 3 4 5 6 7 8 9 |
chrom |
Character specifying chromosome |
pos_start |
Integer start position on chromosome |
pos_end |
Integer end position on chromosome |
pos |
Integer position on chromosome |
hgncid |
HGNC gene identifier |
ensemblid |
ENSEMBL gene identifier |
rs |
dbSNP rs identifier |
surround |
Distance around entity to include in region |
pos_ge |
Position greater-or-equal required |
pos_le |
Position less-or-equal required |
pos_end_ge |
End position greater-or-equal required |
pos_start_le |
Start position less-or-equal required |
pos_start_ge |
Start position greater-or-equal required |
pos_end_le |
End position less-or-equal required |
tablename |
Database table name |
dbc |
Database connection |
The gtxregion() function provides a unified interface for other
functions to define a genomic region (or potentially for a user to
invoke directly). For any valid combination of its optional
arguments, it returns genomic coordinates (chromosome, start and end
positions) as described below, using the database connection
dbc to resolve any queries (such as the coordinates of a named
gene).
When accessing this functionality indirectly via higher level
functions (such as regionplot() and
coloc()), the functionality should be almost completely
intuitive for most users, and if necessary can be learned by example
from the manual pages and vignettes for those higher level functions.
It suffices to add that the optional arguments are used according to a
priority order, which is exactly the order of arguments in the
function definition. For example if chrom, pos_start,
pos_end and hgnc are all provided, hgnc has lower
priority and is ignored. Similarly if hgnc and pos are
provided, pos has lower priority and is ignored.
It is an intended design feature that pos and rs are
lowest in the priority order. When used in conjunction with higher
priority arguments such as hgnc, a pos or rs
argument can be used without affecting the genomic region
specified, which then allows a function that wraps gtxregion()
to use pos or rs for secondary purposes, such as to
highlight a specific position or variant in a visual display. Thus,
regionplot(..., pos = 1234567, surround = 500000) selects a
500kb region around position 1234567 and visually highlights any
variant present at position 1234567, and regionplot(..., hgnc =
'ABC123', surround = 10000, pos = 1234567) selects a 10kb region
around the ABC123 gene and visually highlights any variant present at
position 1234567.
The remainder of this manual page is more technical documentation,
intended for programmers writing new high level functions that will
work alongside regionplot() and coloc(),
and should be read in combination with the source code.
The gtxregion() function resolves its arguments to genomic
coordinates as follows:
If the arguments chrom, pos_start and pos_end are
all provided, these are checked for validity and used to directly
specify the return value.
Otherwise, if the argument hgnc is provided, TABLE genes
is queried (using dbc and gtxwhere) and a region
spanning the gene(s) plus surrounding distance is returned.
Otherwise, if the argument ensg (integer) is provided,
TABLE genes is similarly queried.
Otherwise, if the arguments chrom and pos are both
provided, these are checked for validity and used plus
surrounding distance to directly specify the return value.
Otherwise, if the argument rs is provided, TABLE sites
(sites_by_rs) is queried (using gtxwhere) and a region
plus surrounding distance is returned.
The methods just described are implemented using if ... else if
... else if ... logic, so for example if a hgnc argument is
provided then any ensg argument is ignored, etc.
The gtxwhere function provides a standardized and sanitized way
to dynamically construct part of a SQL WHERE statement. This is best
illustrated by the examples below. When more than one argument value
is given, either as multiple values for a single argument, or for more
that one argument, the following logic seems most useful: Multiple
values for a single argument are combined or-wise, and multiple
arguments are combined and-wise.
To use gtxwhere to select chromosome segments (such as genes or
other entities, recombination rate segments, etc) that wholly or
partially overlap a query region, use pos_end_ge=query_start
and pos_start_le=query_end. To select only chromosome
segments that wholly overlap, instead use
pos_start_ge=query_start and pos_end_le=query_end.
For identifiers that are represented (for efficiency) as integers in
database tables but as strings in “user space”, gtxwhere
is the layer at which string-to-integer checking and conversion should
occur.
gtxregion returns a named list with elements ‘chrom’
(character), ‘pos_start’ (integer) and ‘pos_end’
(integer).
gtxwhere returns a character string suitable for inclusion
after the WHERE clause in a SQL statement.
Toby Johnson Toby.x.Johnson@gsk.com
1 2 3 4 5 6 7 8 9 10 11 12 | ## Not run:
gtxregion(chrom = 1, pos_start = 109616403, pos_end = 109623689)
# dies without an open ODBC connection
## End(Not run)
gtxwhere(rs = 'rs599839')
gtxwhere(chrom = 1, pos = c(109616403, 109623689))
gtxwhere(chrom = 1, pos_end_ge = 109616403, pos_start_le = 109623689)
## Not run:
gtxwhere()
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.