load_database: Load annotations.

Description Usage Arguments Details Value

Description

The function loads the annotation predictor variables into the workspace.

Usage

1
load_database(annotation_file, pos = 2)

Arguments

annotation_file

the directory to a data.frame of annotation variables used to update the beta values through gradient boosted regression tree models. The first column of the matrix must be the SNP IDs and the remaining columns could be additional annotation information. The SNP IDs must be in the same order as those in beta.

pos

an integer indicating which columns of the data matrix annotations is the corresponding consortium value and additionally which columns should also be included.

Details

The annotation matrix provides the necessary predictor variables used to update the weights of polygenic gene score via gradient boosted regression tree. The data.frame should have at least two columns, the first column is SNP_ID; the rest are the adjusted consortia regression coefficient or summary statistics. It is recommended to adjust the consortia regression coefficient by the minor allele frequency of the SNP:

1
2
3
   SNP_SD = sqrt(2 * as.numeric(MAF[,5]) * (1 - as.numeric(MAF[,5])))
   beta_adj = as.numeric(beta) * SNP_SD
   

For any one trait, at least one column of corresponding adjusted beta from the consortium is required. For instance, if we work on BMI, at least the adjusted regression coefficient for association with BMI in a consortium study should be provided. Additional annotations such as related regression coefficients of other traits, or SNP functional annotations can also be included.

Value

a data frame of predictor variables that can be used to update SNPs weights.


GMELab/GraBLD documentation built on May 4, 2019, 3:20 p.m.