get_data_num: Normalize the regression coefficients and annotation data

Description Usage Arguments Value Examples

Description

The function process the unviariate beta regression as well as the annotation data matrix and combine the normalized data used for estimating the optimal boosted regression trees model.

Usage

1
2
get_data_num(betas, annotations, pos = 2, pos_sign = 3, abs_effect = 2:5,
  normalize = FALSE)

Arguments

betas

a matrix of regression coefficients from association analysis in the target population. The first column is the chromosome for each SNP, and the column with the regression coefficient should be specified by setting pos. The default value for pos is 2. The SNP IDs or other information could be present as additional columns. Users need to prepare univariate association beta file without headers. The betas were generated from the model:
coef(summary(lm(pheno_data ~ geno[,j])))[2,1]

Both genotype data and phenotype data over individuals need to be standardized to have mean = 0 and variance = 1.

annotations

a matrix of annotation variables used to update the beta values through gradient boosted regression tree models. Usually, this can be taken from the summary-level test statistics of matching traits from genome-wide consortia available online. The first column of the matrix must be the SNP IDs and the remaining columns could be additional annotation information. The SNP IDs must be in the same order as those in beta.

pos

an integer indicating which columns of the data matrix annotations is the corresponding consortium value and additionally which columns should also be included.

pos_sign

an integer indicating which column of the data matrix annotations should be used to update the sign of the univariate regression coefficient. Usually, it is set to be the consortium univariate regression coefficient of the same trait.

abs_effect

a vector of integers indicating which columns of the data matrix annotations should be used as absolute effect by taking the absolute sign. For example, when only the strength of the effect rather than the direction of the effect is informative for improving the polygenic score weights.

normalize

a logic indicating whether the univariate beta regression coefficients in beta should be normalized with respect to the consortium values in annotations.

Value

a data matrix that can be directly used to estimate the optimal boosted regression trees model.

Examples

1
2

GMELab/GraBLD documentation built on May 4, 2019, 3:20 p.m.