Description Usage Arguments Details Value Author(s)
Multiple colocalization analyses.
1 2 3 4 5 6 7 8 9 10 11 12 | multicoloc(analysis1, analysis2,
chrom, pos_start, pos_end, pos,
hgncid, ensemblid, rs, surround = 0,
hard_clip = FALSE,
style = 'heatplot',
thresh_analysis = 0.1, thresh_entity = 0.1,
dbc = getOption("gtx.dbConnection", NULL))
multicoloc.data(analysis1, analysis2,
chrom, pos_start, pos_end, pos,
hgncid, ensemblid, rs, surround = 0,
hard_clip = FALSE,
dbc = getOption("gtx.dbConnection", NULL))
|
analysis1 |
The key value(s) for GWAS analysis/es to analyze |
analysis2 |
The key value for the second GWAS analysis to analyze |
chrom |
Argument passed to |
pos_start |
Argument passed to |
pos_end |
Argument passed to |
pos |
Argument passed to |
hgncid |
Argument passed to |
ensemblid |
Argument passed to |
surround |
Argument passed to |
hard_clip |
Logical, see details |
style |
Character specifying plot style(s) |
thresh_analysis |
Probability threshold for inclusion in plots |
thresh_entity |
Probability threshold for inclusion in plots |
dbc |
Database connection |
multicoloc()
is an entry point for multiple colocalization
analyses. It supports the most common use case, to colocalize
association signals from one or more analyses of gene
expression/protein levels (specified by analysis1
), each of
which includes association statistics for multiple entities (genes or
proteins), against an association signal from a single analysis
(typically a disease or clinical phenotype, specified by
analysis2
). For this use case, multicoloc()
is
typically more convenient and (much) faster than looping over multiple
calls to coloc()
.
multicoloc()
offers a choice of two different algorithms for
controlling the genomic region from which summary statistics are used
for colocalization analyses, controlled by the argument
hard_clip
. The default, hard_clip=FALSE
, uses the full
set of available summary statistics for the entity/ies analyzed from
each analysis included in analysis1
. In this mode, the genomic
range arguments chrom
, pos_start
, pos_end
,
hgncid
etc. are only used (via gtxregion()
) to
determine the set of entities to be analyzed. Typically, this results
in different entities being analyzed for colocalization using
different (albeit overlapping) regions of the association signal from
analysis2
. The alternative hard_clip=TRUE
, uses only
summary statistics within the genomic range specified (via the
arguments passed to gtxregion()
). Typically, different
entities will be analyzed for colocalization using the same or similar
regions of the association signal from analysis2
, depending on
how the genomic range overlaps the summary statistics available for
each entity. The exact algorithms used in each mode are detailed
below. (And can be visualized using plot style= in a forthcoming
release.)
When hard_clip=FALSE
, the algorithm used by multicoloc()
first determines a “seed region” using the genomic region
arguments, as interpreted by gtxregion()
. Next, a set
of entities is determined from the summary statistics for all analyses
included in analysis1
, consisting of all entites with summary
statistics overlapping this “seed region”. (Better
implementation of overlap is forthcoming). Finally, an
“expanded region” is determined, that includes all available
summary statistics for all of these entities. This “expanded
region” is then used for each colocalization analyses, for each entity
within each analysis within analysis1
, against
analysis2
. Notes and Warnings: This algorithm only makes sense
if the summary statistics are restricted to localized regions around
each entity, such as cis- regions for eQTL analyses. Typically,
different entities will be evaluated for colocalization using
different regions of summary statistics for analysis2
. Because
the set of entities is determined by aggregating over all analyses in
analysis1
, unexpected results may be produced if a given entity
has summary statistics at very different genomic positions in
different analyses. The set of entities is combined across
“Seed regions” specified using only the index variant from a
GWAS signal (e.g. using pos
or rs
with the default
surround=0
) will not guarantee to select all entities with
summary statistics for cis- regions spanning such a single base pair
“seed” region, if some entities are missing summary statistics
for the variant in that “seed” region. [This last issue will
be fixed in a forthcoming update.]
When hard_clip=FALSE
, the algorithm used by multicoloc()
is simply to select all summary statistics within the genomic region
arguments, as interpreted by gtxregion()
. The typical
use case is to set this genomic region as the extent of the
‘significant’ part of the association signal for
analysis2
. The hard_clip=FALSE
mode is (currently) not
the default option, because in initial exploratory analyses it is
unusual to precisely specify this region, and because we believe the
number of ‘false positive’ colocalizations is reduced by
including the whole cis- eQTL region (assuming that the strongest
disease signal in the region ‘should’ be aligned with the
strongest cis- eQTL signal). Notes and Warnings: In general, a given
entity may have summary statistics that only partially overlap the
genomic region specified, which may have unexpected consequences. In
a future release it will be possible to automatically subset to
entities that overlap the genomic region specified by more than a
chosen percentage. When using a hgncid
or ensemblid
gene identifier to specify the region from which to use summary
statistics, the default surround=0
will not include the
full cis eQTL region.
In a future release the output of multicoloc will be a long skinny dataframe with the full colocalization results (all priors, bfs and posteriors, numbers of variants and min and max positions used).
multicoloc
returns a data frame containing the result of the
colocalization analyses, see coloc.fast
for details.
The plot is generated as a side effect.
Toby Johnson Toby.x.Johnson@gsk.com
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.