LDadj: LD adjustment.

Description Usage Arguments Details Value References Examples

Description

The function calculates LD adjustment for each individual SNP based on the number of up/down stream SNPs for genome-wide SNPs.

Usage

1
2
LDadj(source_data = NULL, geno_raw, size = 300, chr = NULL,
  max_size = 1e+05, write = FALSE, outname = NULL, NAval = NA)

Arguments

source_data

the name of the file which the genotype data are to be read from. Each row of the matrix appears as one line of the file. Could be an absolute path to the file or the name of the file assuming in the present directory getwd().

geno_raw

a matrix of the raw genotype data, contains n individuals (by row) and m SNPs (by column). Either source_data or geno_raw can be provided.

size

an integer for the number of SNPs that should be included in a block. Usually, for genome-wide datasets, there are about 2 million SNPs and +/- 300 SNPs is roughly equivalent to 1Mb physical distance, thus 300 is set as the default size.

chr

an integer for the chromosome of genotype data supplied, is not required and only used to name the output file. It is recommended to compute LD adjustments by chromosome to save memory.

max_size

an integer for the maximum size of SNPs to be standardized at a time, the default is 100000. This option can be ignored for data with less than 1 million SNPs.

write

a logic indicating whether the results should be written in a text file. If TRUE, the user should also provide the output file name by specifying outname; otherwise the default is 'LDadj_chr_size_size_SNPs.txt'.

outname

a character giving the name of the output file if write is TRUE. Otherwise, the default name 'LDadj_chr_size_size_SNPs.txt' is used.

NAval

the missing value used in the output data matrix. The default is NA, for use with PLINK, set NAval = -9.

Details

For large datasets, it is recommended to run from the command line with

1
2
3
4
for((i = 1; i <= chr; i++))
do
Rscript PerformLDadj.R size data_name ${i} &
done

where the R script PerformLDadj.R might look something like this, while additional options can be added to the argument list:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
#!/bin/sh
rm(list = ls())
library('GraBLD')
args = (commandArgs(TRUE))
size = eval(parse(text=args[1]))
source_data = args[2]
chr = eval(parse(text=args[3]))
geno_data = load_geno(source_data = source_data, PLINK = TRUE, chr = chr)
geno_norm = full_normal_geno(geno_data)
LD_OUT <- LDadj(geno_raw = geno_norm, chr = chr, size = size, write = TRUE)

Value

a numeric vector of LD adjustments with length matching the number of SNPs in the genotype data provided.

References

Guillaume Pare, Shihong Mao, Wei Q Deng (2017) A machine-learning heuristic to improve gene score prediction of polygenic traits Short title: Machine-learning boosted gene scores, bioRxiv 107409; doi: https://doi.org/10.1101/107409; http://www.biorxiv.org/content/early/2017/02/09/107409

Pare, Guillaume, Shihong Mao, and Wei Q. Deng. A method to estimate the contribution of regional genetic associations to complex traits from summary association statistics. Scientific reports 6 (2016): 27644.

Examples

1
2
data(geno)
LDadj(geno_raw = geno, chr = 1, size = 200)

GMELab/GraBLD documentation built on May 4, 2019, 3:20 p.m.