JAM: JAM (Joint Analysis of Marginal statistics)

Description Usage Arguments Value Author(s) See Also Examples

View source: R/JAM.R

Description

Wrapper for JAM (Joint Analysis of Marginal statistics). See the Vignette.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
JAM(
  marginal.betas = NULL,
  n = NULL,
  ns.each.ethnicity = NULL,
  X.ref = NULL,
  cor.ref = NULL,
  mafs.ref = NULL,
  model.space.priors = NULL,
  beta.priors = NULL,
  beta.prior.partitions = NULL,
  g.prior = TRUE,
  tau = NULL,
  xtx.ridge.term = 0,
  enumerate.up.to.dim = 0,
  full.mcmc.sampling = FALSE,
  n.iter = 1e+06,
  n.mil.iter = NULL,
  thinning.interval = NULL,
  seed = NULL,
  extra.arguments = NULL,
  initial.model = NULL,
  save.path = NULL,
  debug = FALSE,
  max.model.dim = -1,
  burnin.fraction = 0.5,
  trait.variance = NULL,
  mrloss.w = 0,
  mrloss.function = "variance",
  mrloss.marginal.by = NULL,
  mrloss.marginal.sy = NULL,
  mafs.if.independent = NULL,
  extra.java.arguments = NULL
)

Arguments

marginal.betas

Vector of (named) marginal effect estimates to re-analyse with JAM under multivariate models. For multi-ethnic "mJAM" please provide a list of vectors, each element of which is a vector of marginal effects for a specific ethnicity over the same variants.

n

The size of the dataset in which the summary statistics (marginal.betas) were calculated

ns.each.ethnicity

For mJAM: A vector of the sizes of each ethnicity dataset in which the summary statistics were calculated.

X.ref

Reference genotype matrix used by JAM to impute the SNP-SNP correlations. If multiple regions are to be analysed this should be a list containing reference genotype matrices for each region. Individual's genotype must be coded as a numeric risk allele count 0/1/2. Non-integer values reflecting imputation may be given. NB: The risk allele coding MUST correspond to that used in marginal.betas. These matrices must each be positive definite and the column names must correspond to the names of the marginal.betas vector.

cor.ref

Alternatively to a reference genotype matrix, a reference correlation matrix AND mafs may be supplied to JAM. NB: The risk allele coding MUST correspond to that used in marginal.betas. These matrices must each be positive definite and the column and row names must correspond to the names of the marginal.betas vector.

mafs.ref

Alternatively to a reference genotype matrix, a reference correlation matrix AND mafs may be supplied to JAM. NB: The risk allele coding MUST correspond to that used in marginal.betas. This must be a named vector with names correspond to the names of the marginal.betas vector.

model.space.priors

Must be specified if model.selection is set to TRUE. Two options are available. 1) A fixed prior is placed on the proportion of causal covariates, and all models with the same number of covariates is equally likely. This is effectively a Poisson prior over the different possible model sizes. A list must be supplied for 'model.space.priors' with an element "Rate", specifying the prior proportion of causal covariates, and an element "Variables" containing the list of covariates included in the model search. 2) The prior proportion of causal covariates is treated as unknown and given a Beta(a, b) hyper-prior, in which case elements "a" and "b" must be included in the 'model.space.priors' list rather than "Rate". Higher values of "b" relative to "a" will encourage sparsity. NOTE: It is easy to specify different model space priors for different collections of covariates by providing a list of lists, each element of which is itself a model.space.prior list asm described above for a particular subset of the covariates.

beta.priors

This allows specifying fixed (potentially informative) priors for the covariate effect priors. A matrix must be passed, with named rows corresponding to parameters, and columns corresponding to the prior mean and variance in that order. When using this option priors must be specified for either just the confounders, which are otherwise given fixed N(0,1e6) priors, or for all covariates.

beta.prior.partitions

Covariate effects under variable selection are ascribed, by default, a common Normal prior, the standard deviation of which is treated as unknown, with a Unif(0.05,2) hyper-prior. This option can be used to partition the covariate effects into different prior groups, each with a seperate hierarchical normal prior. beta.prior.partitions must be a list with as many elements as desired covariate groups. The element for a particular group must in turn be a list containing the following named elements: "Variables" - a list of covariates in the prior group, and "UniformA" and "UniformB" the Uniform hyper parameters for the standard deviation of the normal prior across their effects.

g.prior

Whether to use a g-prior for the beta's, i.e. a multivariate normal with correlation structure proportional to sigma^2*X'X^-1, which is thought to aid variable selection in the presence of strong correlation. By default this is enabled.

tau

Value to use for sparsity parameter tau (under the tau*sigma^2 parameterisation). When using the g-prior, a recommended default is max(n, P^2) where n is the number of individuals, and P is the number of predictors.

xtx.ridge.term

Value to add to the constant of the diagonal of X'X before JAM takes the Cholesky decomposition.

enumerate.up.to.dim

Whether to make posterior inference by exhaustively calculating the posterior support for every possible model up to this dimension. Leaving at 0 to disable and use RJMCMC instead. The current maximum allowed value is 5.

full.mcmc.sampling

By default JAM only samples models, since the parameters can be analytically integrated out. If you would like to force JAM to perform full Reversible Jump MCMC of models and parameters (effects and residual), then set this to TRUE. The posterior summaries can be seen using PrettyResultsTable. Note that this option is forced to false if inference via model enumeration is requested by setting enumerate.up.to.dim>0.

n.iter

Number of iterations to run (default is 1e6)

n.mil.iter

Number of million iterations to run. Can optionally be used instead of n.iter for convenience, which it will overide if specified.

thinning.interval

Every nth iteration to store (i.e. for the Java algorithm to write to a file and then read into R). By default this is the number of iterations divided by 1e4 (so for 1 million iterations every 100th is stored.) Higher values (so smaller posterior sample) can lead to faster runtimes for large numbers of covariates.

seed

Which random number seed to use in the RJMCMC sampler. If none is provided, a random seed is picked between 1 and 2^16.

extra.arguments

A named list of additional arguments for which there are not currently dedicated options for. This can be used to modify various "under the hood" settings, including all prior hyper-parameters, MCMC mixing parameters such as the probabilities of add/delete/swap moves as well as the adaption settings. A list of all settings available for modification can be seen by typing "data(DefaultArguments)" and then "default.arguments", which lists their names and default values.

initial.model

An initial model for the covariates under selection can be specified as a vector of 0s and 1s. If left un-specified the null (empty) model is used.

save.path

By default R2BGLiMS writes the posterior samples to a temporary file that is deleted after they have been read in to R. By specifying a file path this can be kept, along with the temporary files used for the data and arguments, which can be useful sometimes for de-bugging.

debug

Whether to output extra information (such as final adaption proposal SDs) which might help with debugging (default is FALSE).

max.model.dim

The maximum model dimension can be specified, therefore truncating the model space. We do not recommend using this option but it might sometimes be useful for robust-ness checks. When left un-specified there is no restriction on the model size.

burnin.fraction

Initial fraction of the iterations to throw away, e.g. setting to 0 would mean no burn-in. The default of 0.5 corresponds to the first half of iterations being discarded.

trait.variance

NB EXPERIMENTAL: Invokes a slightly different summary statistic likelihood, which requires an estimate of the trait variance to be provided. (Not yet implemented to work with mJAM or enumeration)

mrloss.w

The relative weight of the MR log loss function for pleiotropy vs the log likelihood. Default 0.

mrloss.function

Choice of pleiotropic loss function from "steve", "variance" (default variance)

mrloss.marginal.by

Marginal associations between SNPs and outcome for the MR loss function model.

mrloss.marginal.sy

Standard errors of marginal associations between SNPs and outcome for the MR loss function model (not required for mrloss.function "variance")

mafs.if.independent

If the SNPs are independent then a reference genotype matrix is not required. However, it is still necessary to provide SNP MAFs here as a named vector. Doing so will lead to X.ref being ignored and the SNPs to be modelled as if they are independent. Note that this option does not work with enumeration.

extra.java.arguments

A character string to be passed through to the java command line. E.g. to specify a different temporary directory by passing "-Djava.io.tmpdir=/Temp".

Value

An R2BGLiMS_Results class object is returned. See the slot 'posterior.summary.table' for a posterior summary of all parameters. See slot 'mcmc.output' for a matrix containing the raw MCMC output from the saved posterior samples (0 indicates a covariate is excluded from the model in a particular sample. Some functions for summarising results are listed under "see also".

Author(s)

Paul Newcombe

See Also

Summary results are stored in the slot posterior.summary.table. See ManhattanPlot for a visual summary of covariate selection probabilities. For posterior model space summaries see TopModels. For convergence checks see ChainPlots and AutocorrelationPlot.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
library(R2BGLiMS)

######################################################################################
# --- Example 1) One region with RJMCMC providing reference individual genotypes --- #
######################################################################################

# By default 1 million iterations are run
library(R2BGLiMS) # Load package
data(JAM_Example) # Load example data
jam.results <- JAM(
  marginal.betas=marginal.betas[snps.region1],
  X.ref=X.ref.region1,
  model.space.prior = list("a"=1, "b"=length(snps.region1), "Variables"=snps.region1),
  n=n,
  trait.variance=var(ipd.continuous[,"y"])) # Optional - but improves performance if a good estimate is available
jam.results@posterior.summary.table # SNP 5 was simulated as causal
ManhattanPlot(jam.results)

##################################################################################
# --- Example 2) One region with RJMCMC using a reference correlation matrix --- #
##################################################################################

# By default 1 million iterations are run
library(R2BGLiMS) # Load package
data(JAM_Example) # Load example data
cor.ref.region1 <- cor(X.ref.region1)
mafs.ref.region1 <- apply(X.ref.region1, MAR=2, mean)/2
jam.results <- JAM(
  marginal.betas=marginal.betas[snps.region1],
  cor.ref=cor.ref.region1,
  mafs.ref=mafs.ref.region1, # NB: MAFs must also be provided if passing JAM cor.ref
  model.space.prior = list("a"=1, "b"=length(snps.region1), "Variables"=snps.region1),
  n=n,
  trait.variance=var(ipd.continuous[,"y"])) # Optional - but improves performance if a good estimate is available
jam.results@posterior.summary.table # SNP 5 was simulated as causal
ManhattanPlot(jam.results)

###############################################################
# --- Example 3) Two regions simultaneously (with RJMCMC) --- #
###############################################################

two.regions.results <- JAM(
  marginal.betas=marginal.betas,
  X.ref=list(X.ref.region1, X.ref.region2),
  model.space.prior = list(
    "a"=1, "b"=length(snps.region1)+length(snps.region2), 
    "Variables" = c(snps.region1, snps.region2)),
  n.mil=1,
  n=n)
two.regions.results@posterior.summary.table # SNPs 5 and 16 were simulated as causal
ManhattanPlot(two.regions.results)

pjnewcombe/R2BGLiMS documentation built on Feb. 10, 2020, 8:52 p.m.