JAMPred: JAMPred

Description Usage Arguments Value Author(s) See Also Examples

View source: R/JAMPred.R

Description

This function runs JAMPred by analysing many SNP blocks independently in parallel.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
JAMPred(
  marginal.betas = NULL,
  n.training = NULL,
  marginal.logor.ses = NULL,
  p.cases.training = NULL,
  ref.geno = NULL,
  total.snps.genome.wide = NULL,
  n.cores = NULL,
  n.mil = 0.2,
  beta.binom.b.lambda = 1,
  beta.binom.a = 1,
  snps.blocks = NULL,
  initial.block.size = 100,
  initial.snps.blocks = NULL,
  seed = NULL,
  thinning.factor = round(1000/n.mil),
  effect.sd.uniform.prior = c(0.05, 2),
  residual.var.invgamma.prior = c(0.01, 0.01),
  save.path = NULL,
  debug = FALSE
)

Arguments

marginal.betas

Vector of named one-at-a-time SNP effects. NB: This must be a named vector; it is where. the SNP names are derived from. These could be from linear regressions, or log-Odds Ratios. NB: If providing log-ORs, you must also provide marginal.logor.ses and p.cases.training so that a linear transformaiton can be applied.

n.training

The sample size in which the marginal.betas were calculated.

marginal.logor.ses

IF marginal log-ORs were provided, please provide a named vector of standard errors here. These are necessary to convert them to a linear scale. Do not set if providing linear regression estimates.

p.cases.training

IF marginal log-ORs were provided, please provide the proportion of cases in the sample in which they were calculated

ref.geno

Reference genotype matrix which will be used to calculate SNP-SNP correlations. Individual's genotype must be coded as a numeric risk allele count 0/1/2. Non-integer values reflecting imputation uncertaimnty may be given. NB: The risk allele coding MUST correspond to that used in marginal.betas and column names must match the names of marginal.betas elements.

total.snps.genome.wide

Total number of SNPs being analysed across the genome. Used to determine the level of prior sparsity.

n.cores

Number of CPU cores to parralelise the SNP blocks over. If unspecified the number available will be detected. NB: On Windows systems this will be forced to 1, since mclapply does not yet support forking.

n.mil

Number of million iterations to run. In the paper we found 0.2 was sufficient to reach adequeate convergence.

beta.binom.b.lambda

Spartisty tuning parameter. Suggest a range of values is tried, e.g. 0.01, 0.1, 1, 10. See the paper for more detail. Default is 1.

beta.binom.a

First hyper-parameter of the beta-binomial prior on model sparsity. See the paper for more detail. Default 1 can usually be used.

snps.blocks

OPTIONAL: A partitioning of the SNPs into blocks, created using the function JAMPred_SplitIntoPositiveDefiniteBlocks. This is automatically generated if not provided.

initial.block.size

Block size for the initial partitioning. Default is 100. This is ignored when initial.snps.blocks is given.

initial.snps.blocks

A user specficied initial partitioning of the SNPs. This must be a list, each element of which is a vector of SNP names that correspond to the columns of ref.geno. This could be defined, for example, according to LD information.

seed

An integer specifying the RJMCMC seed. If not set a random number will be used every time this is run.

thinning.factor

Determines every ith iteration to use from the RJMCMC sample. The more thinning that is applied the quicker the analysis will be, for the same number of iterations. Leaving at the default should be fine.

effect.sd.uniform.prior

Upper and lower uniform hyper-parameters for the prior on the standard deviation of SNP effects. Default is what we used in the paper, a Uniform(0.05, 2) distribution.

residual.var.invgamma.prior

Hyper-parameters for the inversegamma prior on the residual variance. Default is what we used in the paper, a Inverse-Gamma(0.01, 0.01) distribution.

save.path

Optional path for JAM to store and save its temporary files in. Can help with debugging (default NULL).

debug

An option passed to JAM requesting more verbose output (default FALSE).

Value

A JAMPred results object, which is a list including as elements step1.posterior.mean.snp.weights (which do not adjust for long range LD) and step2.posterior.mean.snp.weights (which do adjust for long range LD). These SNP weights can be used to generate predictions from individual level genotype data.

Author(s)

Paul Newcombe

See Also

See JAMPred_SplitIntoPositiveDefiniteBlocks; this is the function used to create an initial partitioning of SNPs into blocksm which satisfy the property of all being positvie definite with respect to the reference genotype matrix (also required by JAMPred). By default this function is used to generate the snps.blocks list input, if not provided by the user.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
# --- Load libraries
library(R2BGLiMS)
data("JAMPred_Example")

###########################################################
# --- Simple demonstation of syntax for binary traits --- #
###########################################################
# NB: See third example for modelling different chromosomes and sparsities

# Run JAMPred
snps <- chromosome.snps[[1]] # Only use chromosome 1 data
jampred.res.bin <- JAMPred(
  marginal.betas = marginal.logors[snps],
  n.training = n.training,
  marginal.logor.ses = marginal.logor.ses, # Only necessary for a binary trait
  p.cases.training = n.cases.training/n.training, # Only necessary for a binary trait
  ref.geno = data.validation[,snps],
  total.snps.genome.wide = 500000, # Total SNPs across all chromosomes
  n.mil = 0.2,
  seed = 1, # For re-producibility. If not set a random seed is used
  debug=T
)

# Generate predictions
out.of.sample.predictions <- 
  data.validation[,jampred.res.bin$snps] %*% 
  jampred.res.bin$step2.posterior.mean.snp.weights

# Predictive r2
cor(out.of.sample.predictions, data.validation[,"d"])^2 # Should be 0.18

###########################################################
# --- Simple demonstation of syntax for binary traits --- #
###########################################################
# NB: See third example for modelling different chromosomes and sparsities

# Run JAMPred
snps <- chromosome.snps[[1]] # Only use chromosome 1 data
jampred.res.lin <- JAMPred(
  marginal.betas = marginal.ctsbetas[snps],
  n.training = n.training,
  ref.geno = data.validation.cts[,snps],
  total.snps.genome.wide = 500000, # Total SNPs across all chromosomes
  n.mil = 0.2,
  seed = 1 # For re-producibility. If not set a random seed is used
)

# Generate predictions
out.of.sample.predictions <- 
  0 + # NB: A trait mean could be set here (otherwise it is assumed the outcome is mean-centred)
  data.validation.cts[,jampred.res.lin$snps] %*% 
  jampred.res.lin$step2.posterior.mean.snp.weights

# Predictive r2
cor(out.of.sample.predictions, data.validation.cts[,"y"])^2 # Should be 0.15

##################################################################################
# --- Using JAMPred to model multiple chromosomes under different sparsities --- #
##################################################################################

# Run JAMPred, looping over sparsities and chromosomes
jampred.res.loop <- list() # List to hold all the results
for (lambda in c(0.01, 0.1, 1, 10)) { # Loop over lambda choices (see the paper)
  jampred.res.loop[[paste(lambda)]] <- list()
  for (chr in c(1:2)) { # Loop over chromosomes
    jampred.res.loop[[paste(lambda)]][[chr]] <- JAMPred(
      marginal.betas = marginal.logors[chromosome.snps[[chr]]],
      n.training = n.training,
      marginal.logor.ses = marginal.logor.ses,
      p.cases.training = n.cases.training/n.training,
      ref.geno = data.validation[,chromosome.snps[[chr]]],
      total.snps.genome.wide = 500000,
      n.mil = 0.2,
      beta.binom.b.lambda = lambda,
      initial.block.size = 100, # Default size of SNP blocks
      seed = 1 # For re-producibility. If not set a random seed is used
    )
  }
}

# Generate predictions for each lambda, summing predictive scores over chromosoms
out.of.sample.predictions <- list()
for (lambda in c(0.01, 0.1, 1, 10)) {
  out.of.sample.predictions[[paste(lambda)]] <- 0
  for (chr in c(1:2)) {
    out.of.sample.predictions[[paste(lambda)]] <-
      out.of.sample.predictions[[paste(lambda)]] +
      data.validation[,jampred.res.loop[[paste(lambda)]][[chr]]$snps] %*% 
      jampred.res.loop[[paste(lambda)]][[chr]]$step2.posterior.mean.snp.weights
  }
}

# Predictive r2
sapply(out.of.sample.predictions, function(preds) cor(preds,data.validation[,"d"])^2)
# Predictive correlation 0.37 and highest at lambda = 0.01

pjnewcombe/R2BGLiMS documentation built on Feb. 10, 2020, 8:52 p.m.