Using geneticriskR for standard analyses: A Walk-through example

In this vignette, we will demonstrate how to use geneticriskR to perform basic analyses for polygenic risk score studies.

Step 1. Install geneticriskR

#install package from github using devtools
#library the package
library(geneticriskR)

Step 2. Read in your PRSice-2 output file and your phenotype file

PRSice-2 output files have a standard format that include prs and IID as column names. Read in the file as .csv.

score_table <- read_csv(here::here("output_file_name.csv"))

The phenotype file should have all phenotypes of interest and IID as column names. Read in the files as a .csv.

phenotypes_table <- read_csv(here::here("phenotype_file_name.csv"))

For this vignette, we will be using the mock data included in this package (pheno_data and score_data). It is important to note that the values for score and phenotype were randomly generated, so they may not appear to be very predictive.

Step 3. Select which phenotype you would like to use from the phenotype file using pheno_select(). In this example, we chose hypertension

phenotype_table <- pheno_select(pheno_data, "hypertension")
head(phenotype_table)

Note: pass your phenotype column name (in this case, hypertension) as a character vector

Step 4. Join the newly generated phenotype_table with the score_table you previously read in using join_scores()

full_table <- join_scores(phenotype_table, score_data)
#Examine the new table
head(full_table)

Step 5. Perform analyses on full_table to assess the predictive power of the scores. In this example, we will use a simple logistic regression model using simple_logistic_reg() and examine the area under the curve using proc_analysis()

#simple logistic regression
simple_mod <- simple_logistic_reg(full_table, "hypertension")
summary(simple_mod)
#ROC/AUC analysis using the previously generated, 'simple_mod'
proc_analysis(full_table, "hypertension", simple_mod)

Note: Again, pass your phenotype as a character vector

Important things to consider from these analyses:

Step 6. Examine the distribution of the scores visually. In this example, we will examine side by side boxplots for the scores in cases and controls using compare_boxplots().

compare_boxplots(full_table, "hypertension", "prs")

Note: pass phenotype and score column names as character vectors

Conclusion from our analyses: since we found an AUC of approxmiately 0.5, a p-value > 0.5 for our scores, and our side by side boxplots appear almost identical, we conclude that our scores do not predict hypertension risk effectively. Again, this is expected as our scores were randomly generated and binary values for hypertension were randomly assigned to individuals in our "cohort."

Step 7. Repeat for new scores and phenotypes. Use other functions in this package, such as k_fold() and complex_logistic_reg().

A comprehensive list of functions in this package:



aballou16/geneticriskR documentation built on Dec. 27, 2019, 3:39 p.m.