coloc_old: Colocalization analysis
In tobyjohnson/gtx: Genetics ToolboX

Description Usage Arguments Details Value Author(s)

Colocalization analysis.

coloc(analysis1, analysis2,
      chrom, pos_start, pos_end,
      hgncid, ensemblid, surround = 500000,
      entity, entity1, entity2,
      style = 'Z', 
      dbc = getOption("gtx.dbConnection", NULL))

`analysis1`	The key value for the first GWAS analysis to analyze
`analysis2`	The key value for the second GWAS analysis to analyze
`chrom`	Character specifying chromosome
`pos_start`	Start position of region
`pos_end`	End position of region
`hgncid`	HGNC identifier of gene to define region around
`ensemblid`	ENSEMBL gene identifier to define region around
`surround`	Distance around gene to include in region
`entity`	Identifier for an entity, for analyses of multiple entities
`entity1`	Identifier for an entity, for analysis1
`entity2`	Identifier for an entity, for analysis2
`style`	Character specifying plot style
`dbc`	Database connection

This high level function conducts a colocalization analysis, using summary statistics for association with two traits, across a region of the genome. The two sets of summary statistics are specified using the analysis1 and analysis2 arguments. Where one or both contain summary statistics for multiple entities (e.g. from eQTL or pQTL analyses), the desired entities must be specified (see below).

Note that when using a hgncid or ensemblid gene identifier to specify the region from which to use summary statistics, the default surround=500000 will not include the full cis eQTL region as usually specified.

The region of interest can be specified in several different ways. The region can be supplied as physical coordinates using the arguments chrom, pos_start and pos_end. Alternatively, the region can be centered on a gene of interest, using either the hgncid or emsemblid argument, and the size of region around the gene can be modified using the surround argument. Note that the primary purpose of gene-identifying arguments hgncid or ensemblid is to specify the genomic region of interest (and thus the set of the variants to analyse). It is only a secondary purpose that the entity for eQTL or pQTL analyses will be inferred from hgncid or ensemblid, if no explicit entity argument is given.

Entities are used to distinguish genomic features, where a single set analysis includes summary statistics, for each variant, for associations with one or more entities. E.g. in an eQTL analysis, each transcript or gene is an entity, and a single typical variant will have summary statistics for associations with multiple transcripts or genes. If either of the analyses specified by analysis1 and analysis2 have results separated by entity, then the arguments entity1 and entity2 are used to specify the desired entity from each. If either entity1 or entity2 is missing, the argument entity is used instead. (This mechanism facilitates e.g. colocalization between analyses for the same transcript between two different eQTL datasets.) If the argument entity is also missing, the function attempts to infer a suitable entity from the hgncid or ensemblid arguments. (This leads to sensible default behaviour, and facilitates the most common use case of centering the genomic region of interest on the entity being analysed in an eQTL or pQTL dataset.)

The style argument can be set to ‘Z’ to plot Z statistics for the two analyses, ‘beta’ to plot beta (effect size) statistics for the two analyses, or ‘none’ to suppress plotting altogether.

coloc returns a data frame containing the result of the colocalization analysis, see coloc.fast for details. The plot is generated as a side effect.

Toby Johnson Toby.x.Johnson@gsk.com

tobyjohnson/gtx documentation built on Aug. 30, 2019, 8:07 p.m.