ssCTPR.pipeline: Run ssCTPR with standard pipeline
In yingxi-kaylee/ssCTPR: Cross-Trait Penalized Regression using Summary Statistics

Description Usage Arguments Details Value Note Examples

The easy way to run ssCTPR

ssCTPR.pipeline(
  cor,
  traits,
  adj = NULL,
  chr = NULL,
  pos = NULL,
  snp = NULL,
  A1 = NULL,
  A2 = NULL,
  ref.bfile = NULL,
  test.bfile = NULL,
  LDblocks = NULL,
  lambda = exp(seq(log(0.001), log(0.1), length.out = 20)),
  s = c(0.2, 0.5, 0.9, 1),
  lambda_ct = c(0, 0.06109, 0.1392, 0.24257),
  destandardize = F,
  trace = 1,
  exclude.ambiguous = TRUE,
  keep.ref = NULL,
  remove.ref = NULL,
  keep.test = NULL,
  remove.test = NULL,
  sample = NULL,
  cluster = NULL,
  max.ref.bfile.n = 20000,
  nomatch = FALSE,
  ...
)

`cor`	A matrix of SNP-wise correlation with primary trait, derived from `p2cor`, and beta of secondary traits if have any
`traits`	The number of traits
`adj`	A matrix of adjacency coefficients. NROW is the number of variants; NCOL is the number of secondary traits.
`chr`	Together with `pos`, chromosome and position for `cor`
`pos`	Together with `chr`, chromosome and position for `cor`
`A1`	Alternative allele (effect allele) for `cor`
`A2`	Reference allele for `cor` (One of `A1` or A2 must be specified)
`ref.bfile`	`bfile` (PLINK binary format, without .bed) for reference panel
`test.bfile`	`bfile` for test dataset
`LDblocks`	Either (1) one of "EUR.hg19", "AFR.hg19", "ASN.hg19", "EUR.hg38", "AFR.hg38", "ASN.hg38", to use blocks defined by Berisa and Pickrell (2015) based on the 1000 Genome data, or (2) a vector to define LD blocks, or (3) a data.frame of regions in bed format
`lambda`	to pass on to `ssCTPR`
`s`	A vector of s
`lambda_ct`	to pass on to `ssCTPR`
`destandardize`	Should coefficients from `ssCTPR` be destandardized using test data standard deviations before being returned?
`trace`	Controls the amount of output.
`exclude.ambiguous`	Should ambiguous SNPs (C/G, A/T) be excluded?
`keep.ref`	Participants to keep from the reference panel (see `parseselect`)
`remove.ref`	Participants to remove from the reference panel(see `parseselect`)
`keep.test`	Participants to keep from the testing dataset (see `parseselect`)
`remove.test`	Participants to remove from the testing dataset (see `parseselect`)
`sample`	Sample size of the random sample taken of ref.bfile
`cluster`	A `cluster` object from the `parallel` package for parallel computing
`max.ref.bfile.n`	The maximum sample size allowed in the reference panel
`...`	parameters to pass to `ssCTPR`

To run ssCTPR we assume as a minimum you have a vector of summary statistics in terms of SNP-wise correlations of primary trait, vector(s) of effect sizes of secondary traits and their positions (chr, pos), one of A1 or A2, and a reference panel, specified either in ref.bfile or test.bfile. If only test.bfile is specified, we assume test.bfile is also the ref.bfile. If only ref.bfile is specified, only ssCTPR coefficients are returned, and polygenic scores are not calculated.

If SNPwise correlations are not available, they can be converted from p-values using the function p2cor.

ssCTPR.pipeline only uses those SNPs that are consistently defined by chr, pos, A1 and A2 and the PLINK .bim files specified with ref.bfile and test.bfile for estimation. matchpos is used to achieve this, which allows for flipping of SNP alleles in their definitions. The beta matrix in the output contains all SNPs that are common to the summary statistics and test.bfile. However, ssCTPR with s < 1 is only run on SNPs that are common to all of ref.bfile, test.bfile and the summary statistics. The ssCTPR coefficients for s < 1 are imputed with results from ssCTPR with s = 1 (soft-thresholding) run on SNPs that are common to test.bfile and the summary stats but not to ref.bfile. To select only SNPs that are common to all three datasets, one can use the also.in.refpanel logical vector in the output.

For keep.ref, remove.ref, keep.test, and remove.test, see the documentation for keep and remove in ssCTPR for details.

A ssCTPR.pipeline object with the following elements

`beta`	A list of ssCTPR coefficients: one list element for each `s`
`test.extract`	A logical vector for the SNPs in `test.bfile` that are used in estimation.
`also.in.refpanel`	A logical vector for the SNPs in `test.bfile` that are used in `ssCTPR`.
`sumstats`	A `data.frame` of summary statistics used in estimation.
`sd`	The standard deviation for the testing dataset
`test.bfile`	The testing dataset
`keep.test`	Sample to keep in the testing dataset
`ref.bfile`	The reference panel dataset
`keep.ref`	Sample to keep in the reference panel dataset
`lambda, s, lambda_ct, keep.test, destandardized`	Information to pass on to `validate.ssCTPR.pipeline`
`pgs`	A matrix of polygenic scores
`destandardized`	Are the coefficients destandardized?
`exclude.ambiguous`	Were ambiguous SNPs excluded?

Berisa, T. & Pickrell, J. K. Approximately independent linkage disequilibrium blocks in human populations. Bioinformatics 32, 283-285 (2015).

## Not run: 
 ### Read summary statistics file ###
 ss <- fread("./data/summarystats.txt")
 head(ss)
 
 ### Convert p-values to correlations, assuming a sample size of 60000 for the p-values ###
 cor <- p2cor(p = ss$P_val.Y1, n = 60000, sign=ss$BETA.Y1))
 cor <- cbind(cor,ss$BETA.Y2)  # summary statistics of secondary traits
 
 ### Run ssCTPR using standard pipeline ### 
 out <- ssCTPR.pipeline(cor=cor, traits=ncol(cor), lambda_ct = lambda_ct, 
                          chr=ss$Chr, pos=ss$Position, 
                          A1=ss$A1, A2=ss$A2,
                          ref.bfile=ref.bfile, test.bfile=test.bfile, 
                          LDblocks = "EUR.hg19")

## End(Not run)