LDadj: LD adjustment.
In GMELab/GraBLD: Gradient Boosted and LD adjusted

Description Usage Arguments Details Value References Examples

The function calculates LD adjustment for each individual SNP based on the number of up/down stream SNPs for genome-wide SNPs.

1 2	LDadj(source_data = NULL, geno_raw, size = 300, chr = NULL, max_size = 1e+05, write = FALSE, outname = NULL, NAval = NA)

`source_data`	the name of the file which the genotype data are to be read from. Each row of the matrix appears as one line of the file. Could be an absolute path to the file or the name of the file assuming in the present directory `getwd()`.
`geno_raw`	a matrix of the raw genotype data, contains `n` individuals (by row) and `m` SNPs (by column). Either `source_data` or `geno_raw` can be provided.
`size`	an integer for the number of SNPs that should be included in a block. Usually, for genome-wide datasets, there are about 2 million SNPs and +/- 300 SNPs is roughly equivalent to 1Mb physical distance, thus 300 is set as the default size.
`chr`	an integer for the chromosome of genotype data supplied, is not required and only used to name the output file. It is recommended to compute LD adjustments by chromosome to save memory.
`max_size`	an integer for the maximum size of SNPs to be standardized at a time, the default is 100000. This option can be ignored for data with less than 1 million SNPs.
`write`	a logic indicating whether the results should be written in a text file. If TRUE, the user should also provide the output file name by specifying `outname`; otherwise the default is 'LDadj_`chr`_size_`size`_SNPs.txt'.
`outname`	a character giving the name of the output file if `write` is TRUE. Otherwise, the default name 'LDadj_`chr`_size_`size`_SNPs.txt' is used.
`NAval`	the missing value used in the output data matrix. The default is NA, for use with PLINK, set `NAval` = -9.

For large datasets, it is recommended to run from the command line with

for((i = 1; i <= chr; i++))
do
Rscript PerformLDadj.R size data_name ${i} &
done

where the R script PerformLDadj.R might look something like this, while additional options can be added to the argument list:

#!/bin/sh
rm(list = ls())
library('GraBLD')
args = (commandArgs(TRUE))
size = eval(parse(text=args[1]))
source_data = args[2]
chr = eval(parse(text=args[3]))
geno_data = load_geno(source_data = source_data, PLINK = TRUE, chr = chr)
geno_norm = full_normal_geno(geno_data)
LD_OUT <- LDadj(geno_raw = geno_norm, chr = chr, size = size, write = TRUE)

a numeric vector of LD adjustments with length matching the number of SNPs in the genotype data provided.

Guillaume Pare, Shihong Mao, Wei Q Deng (2017) A machine-learning heuristic to improve gene score prediction of polygenic traits Short title: Machine-learning boosted gene scores, bioRxiv 107409; doi: https://doi.org/10.1101/107409; http://www.biorxiv.org/content/early/2017/02/09/107409

Pare, Guillaume, Shihong Mao, and Wei Q. Deng. A method to estimate the contribution of regional genetic associations to complex traits from summary association statistics. Scientific reports 6 (2016): 27644.