GraBLD.score: Gradient Boosted and LD adjusted Prediction.

Description Usage Arguments Value References Examples

Description

The function returns the prediction r-squared on the target population using polygenic gene score based on the GraBLD heuristic.

Usage

1
2
3
GraBLD.score(source_data = NULL, chr = NULL, geno_raw, PLINK = TRUE,
  SNPnames = NULL, max_size = 1e+05, NAval = NA, Pheno = NULL, LDadjVal,
  gbmVal, trait_name = NULL, WRITE = FALSE)

Arguments

source_data

the name of the file which the genotype data are to be read from. Each row of the matrix appears as one line of the file. Could be an absolute path to the file or the name of the file assuming in the present directory getwd().

chr

an integer indicating the maximum number of chromosomes to be read in, this option is used in combination with source_data, to perform analysis by each chromosome. In this case, the file name should follow: “source_data_i.raw” for all i ≤ chr. For example, Geno_Matrix_23.raw.

geno_raw

the genotype matrix assuming each row is the individual and each column is the SNP. Can be skipped if source_data was provided.

PLINK

a logic indicating whether the supplied file is of the .raw format from PLINK, if not, the first six columns will not be removed and all columns will be read in.

SNPnames

a vector of characters for the names of SNPs used, these are only used in the output of GraBLD weights.

max_size

an integer for the maximum size of SNPs to be standardized at a time, the default is 100000. This option can be ignored for data with less than 1 million SNPs.

NAval

the missing value used in the output data matrix. The default is NA, for use with PLINK, set NAval = -9.

Pheno

a numeric vector of quantitative traits with the same length as the number of rows in the genotype matrix.

LDadjVal

a numeric vector of LD adjusted scores with length matching the number of SNPs in the genotype matrix. This can be taken from the output of LDadj().

gbmVal

a numeric vector of gradient boosted weights with length matching the number of SNPs in the genotype matrix. This can be taken from the output of GraB().

trait_name

a character indicating the name of the quantitative trait.

WRITE

a logic indicating whether the results of the GraBLD weights should be written to a file with file name trait_name_gbm_beta.txt.

Value

if Pheno is supplied, both the polygenic gene score as well as the prediction R-squared (adjusted) are returned, otherwise only the polygenic gene score is returned.

References

Guillaume Pare, Shihong Mao, Wei Q Deng (2017) A machine-learning heuristic to improve gene score prediction of polygenic traits Short title: Machine-learning boosted gene scores, bioRxiv 107409; doi: https://doi.org/10.1101/107409; http://www.biorxiv.org/content/early/2017/02/09/107409

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
data(geno)
data(univariate_beta)
data(annotation_data)
LD_val <- LDadj(geno_raw = geno, chr = 1, size = 200)
gbm_val <- list()
for (j in 1:5){
gbm_val[[j]] <- GraB(betas = univariate_beta, annotations = annotation_data,
trait_name = 'BMI', steps = j, validation = 5)
}
data(BMI)
gs <- GraBLD.score(geno_raw = geno, LDadjVal=LD_val, gbmVal = unlist(gbm_val),
 trait_name='BMI', Pheno = BMI[,1])

GMELab/GraBLD documentation built on May 4, 2019, 3:20 p.m.