ssCTPR.pipeline: Run ssCTPR with standard pipeline

Description Usage Arguments Details Value Note Examples

View source: R/ssCTPR.pipeline.R

Description

The easy way to run ssCTPR

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
ssCTPR.pipeline(
  cor,
  traits,
  adj = NULL,
  chr = NULL,
  pos = NULL,
  snp = NULL,
  A1 = NULL,
  A2 = NULL,
  ref.bfile = NULL,
  test.bfile = NULL,
  LDblocks = NULL,
  lambda = exp(seq(log(0.001), log(0.1), length.out = 20)),
  s = c(0.2, 0.5, 0.9, 1),
  lambda_ct = c(0, 0.06109, 0.1392, 0.24257),
  destandardize = F,
  trace = 1,
  exclude.ambiguous = TRUE,
  keep.ref = NULL,
  remove.ref = NULL,
  keep.test = NULL,
  remove.test = NULL,
  sample = NULL,
  cluster = NULL,
  max.ref.bfile.n = 20000,
  nomatch = FALSE,
  ...
)

Arguments

cor

A matrix of SNP-wise correlation with primary trait, derived from p2cor, and beta of secondary traits if have any

traits

The number of traits

adj

A matrix of adjacency coefficients. NROW is the number of variants; NCOL is the number of secondary traits.

chr

Together with pos, chromosome and position for cor

pos

Together with chr, chromosome and position for cor

A1

Alternative allele (effect allele) for cor

A2

Reference allele for cor (One of A1 or A2 must be specified)

ref.bfile

bfile (PLINK binary format, without .bed) for reference panel

test.bfile

bfile for test dataset

LDblocks

Either (1) one of "EUR.hg19", "AFR.hg19", "ASN.hg19", "EUR.hg38", "AFR.hg38", "ASN.hg38", to use blocks defined by Berisa and Pickrell (2015) based on the 1000 Genome data, or (2) a vector to define LD blocks, or (3) a data.frame of regions in bed format

lambda

to pass on to ssCTPR

s

A vector of s

lambda_ct

to pass on to ssCTPR

destandardize

Should coefficients from ssCTPR be destandardized using test data standard deviations before being returned?

trace

Controls the amount of output.

exclude.ambiguous

Should ambiguous SNPs (C/G, A/T) be excluded?

keep.ref

Participants to keep from the reference panel (see parseselect)

remove.ref

Participants to remove from the reference panel(see parseselect)

keep.test

Participants to keep from the testing dataset (see parseselect)

remove.test

Participants to remove from the testing dataset (see parseselect)

sample

Sample size of the random sample taken of ref.bfile

cluster

A cluster object from the parallel package for parallel computing

max.ref.bfile.n

The maximum sample size allowed in the reference panel

...

parameters to pass to ssCTPR

Details

To run ssCTPR we assume as a minimum you have a vector of summary statistics in terms of SNP-wise correlations of primary trait, vector(s) of effect sizes of secondary traits and their positions (chr, pos), one of A1 or A2, and a reference panel, specified either in ref.bfile or test.bfile. If only test.bfile is specified, we assume test.bfile is also the ref.bfile. If only ref.bfile is specified, only ssCTPR coefficients are returned, and polygenic scores are not calculated.

If SNPwise correlations are not available, they can be converted from p-values using the function p2cor.

ssCTPR.pipeline only uses those SNPs that are consistently defined by chr, pos, A1 and A2 and the PLINK .bim files specified with ref.bfile and test.bfile for estimation. matchpos is used to achieve this, which allows for flipping of SNP alleles in their definitions. The beta matrix in the output contains all SNPs that are common to the summary statistics and test.bfile. However, ssCTPR with s < 1 is only run on SNPs that are common to all of ref.bfile, test.bfile and the summary statistics. The ssCTPR coefficients for s < 1 are imputed with results from ssCTPR with s = 1 (soft-thresholding) run on SNPs that are common to test.bfile and the summary stats but not to ref.bfile. To select only SNPs that are common to all three datasets, one can use the also.in.refpanel logical vector in the output.

For keep.ref, remove.ref, keep.test, and remove.test, see the documentation for keep and remove in ssCTPR for details.

Value

A ssCTPR.pipeline object with the following elements

beta

A list of ssCTPR coefficients: one list element for each s

test.extract

A logical vector for the SNPs in test.bfile that are used in estimation.

also.in.refpanel

A logical vector for the SNPs in test.bfile that are used in ssCTPR.

sumstats

A data.frame of summary statistics used in estimation.

sd

The standard deviation for the testing dataset

test.bfile

The testing dataset

keep.test

Sample to keep in the testing dataset

ref.bfile

The reference panel dataset

keep.ref

Sample to keep in the reference panel dataset

lambda, s, lambda_ct, keep.test, destandardized

Information to pass on to validate.ssCTPR.pipeline

pgs

A matrix of polygenic scores

destandardized

Are the coefficients destandardized?

exclude.ambiguous

Were ambiguous SNPs excluded?

Note

Berisa, T. & Pickrell, J. K. Approximately independent linkage disequilibrium blocks in human populations. Bioinformatics 32, 283-285 (2015).

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
## Not run: 
 ### Read summary statistics file ###
 ss <- fread("./data/summarystats.txt")
 head(ss)
 
 ### Convert p-values to correlations, assuming a sample size of 60000 for the p-values ###
 cor <- p2cor(p = ss$P_val.Y1, n = 60000, sign=ss$BETA.Y1))
 cor <- cbind(cor,ss$BETA.Y2)  # summary statistics of secondary traits
 
 ### Run ssCTPR using standard pipeline ### 
 out <- ssCTPR.pipeline(cor=cor, traits=ncol(cor), lambda_ct = lambda_ct, 
                          chr=ss$Chr, pos=ss$Position, 
                          A1=ss$A1, A2=ss$A2,
                          ref.bfile=ref.bfile, test.bfile=test.bfile, 
                          LDblocks = "EUR.hg19")

## End(Not run)

yingxi-kaylee/ssCTPR documentation built on Nov. 14, 2021, 5:24 a.m.