lassosum_standalone.md
In tshmak/lassosum: LASSO with summary statistics and a reference panel

lassosum (standalone version for Linux)

This page is for the standalone version of lassosum. For full details of lassosum as an R package, please refer to this page.

Follow the instruction here to install lassosum on R. Then add the lassosum path to the $PATH variable. The lassosum path can be obtained by typing the following in R:

> system.file(package="lassosum")

For example, on my computer, I would type

$ PATH=/home/tshmak/WORK/Rpackages2/nonMRAN/lassosum/:$PATH

in my Linux shell.

The following is a quick example to run lassosum from a Linux shell, assuming lassosum has been included in $PATH.

$ lassosum --data summarystats.txt --chr Chr --pos Position \
        --A1 A1 --A2 A2 --pval P_val --n 50000 \
        --OR OR_A1 --test.bfile testsample \
        --LDblocks EUR.hg19 --pheno testsample.pheno.txt \
        --nthreads 2

This will generate the following files:

lassosum.lassosum.pipeline.rds
lassosum.validate.rds
lassosum.validate.results.txt
lassosum.splitvalidate.rds
lassosum.splitvalidate.results.txt

The best PGS calculated by validation and split-validation are given in lassosum.validate.results.txt and lassosum.validate.results.txt. The .rds files are for further processing. For example, if you want to apply the best validated PGS to a new dataset (with bfile=refsample), type:

$ lassosum --lassosum.pipeline lassosum.lassosum.pipeline.rds \
        --validate.rds lassosum.validate.rds \
        --applyto refsample

This will create a file called:

lassosum.results.txt

containing the best PGS in the new data.

To actually try out the above example, copy the relevant files from the directory given by

> system.file("data", package="lassosum")

Almost all of the options available to the R version can be passed to lassosum standalone by prepending the option with --. For example, type

$ lassosum ... --ref.bfile refsample --lambda 0.001, 0.002 --keep.test keep.txt ...

to include refsample as the reference bfile, use 0.001 and 0.002 as values for lambda, and use only those samples specified in the keep.txt file as the testing dataset.

However, there are a number of options which are specific to the standalone version, given below:

--data (required) The filename for the summary statistics data set. The summary statistics file must have headers, and the corresponding columns for correlation, chromosome, position, snp, A1 (alternative allele), A2 (reference allele), etc., are specified by the --cor, --chr, --pos, --snp, --A1, --A2 option tags, corresponding to the same options in the lassosum.pipeline() R function. In addition, you can specify the columns for --pval, --n, --beta, --OR, or --LDblocks, corresponding to columns for pvalues, sample size, beta (effect size), odds ratio, or LD blocks.
--n The sample size. Can either be a column name in the summary statistics file, or a single number representing the sample size.
--LDblocks (required) Either a column name for a specified column in the summary statistics file, or a .bed file giving the LD blocks, or one of EUR.hg19, ASN.hg19, AFR.hg19, EUR.hg38, ASN.hg38, AFR.hg38 to use pre-defined LD blocks given by Berisa and Pickrell (2015).
--pval, --beta, --OR, --n These are only used to calculate the correlations using the function p2cor if --cor is not specified.
--pheno A text file with headers and 3 columns -- the first two columns must have headers FID and IID. The third column gives the phenotype. Used for performing validation and split-validation.
--covar A text file with headers and at least 3 columns -- the first two columns must have headers FID and IID. The other columns give the covariates for adjustment. Used for performing validation and split-validation.
--out The filename stub with which output files are named. This defaults to lassosum.
--lassosum.pipeline This gives the .rds file generated by lassosum, which saves the lassosum.pipeline object. You can then perform validation/pseudovalidation/splitvalidation on this object without rerunning lassosum.pipeline.
--validate Perform validation of the lassosum.pipeline results. This is automatically turned on when the --pheno switch is specified.
--splitvalidate Perform split-validation of the lassosum.pipeline results. This is automatically turned on when the --pheno switch is specified.
--pseudovalidate Perform pseudovalidation of the lassosum.pipeline results.
--applyto, --validate.rds Apply the validated best PGS to a new dataset. See example above.
--nthreads Number of threads to use.