In privefl/mypack: Analysis of Massive SNP Arrays

options(htmltools.dir.version = FALSE, width = 75, max.print = 30)
knitr::opts_knit$set(global.par = TRUE, root.dir = "..")
knitr::opts_chunk$set(echo = TRUE, fig.align = 'center', dev = 'png', out.width = "90%")

If you have individual-level data (i.e. genotypes and phenotypes), you can basically use any supervised learning (machine learning) method to train a PGS. However, because of the size of the genetic data, you will quickly have scalability issues with these models. Moreover, it has been shown that effects for most diseases and traits are small and essentially additive, and that fancy methods such as deep learning are not much effective at constructing PGS [@kelemen2025performance].

Therefore, using penalized linear/logistic regression (PLR) can be a very efficient and effective method to train PGS. In my R package bigstatsr, I have developed a very fast implementation with automatic choice of the two hyper-parameters [@prive2019efficient]. You can find a tutorial explaining its implementation and use here.

This is an example of using PLR for predicting height from genotypes in the UK Biobank

training on 350K individuals x 656K variants in less than 24H
within both males and females, PGS achieved a correlation of 65.5% ($r^2$ of 42.9%) between genetically predicted and true height

knitr::include_graphics("https://privefl.github.io/blog/images/UKB-final-pred.png")

References

privefl/mypack documentation built on July 5, 2025, 12:18 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

privefl/mypack
Analysis of Massive SNP Arrays

In privefl/mypack: Analysis of Massive SNP Arrays

References

R Package Documentation

Browse R Packages

We want your feedback!

privefl/mypack Analysis of Massive SNP Arrays

In privefl/mypack: Analysis of Massive SNP Arrays

References

R Package Documentation

Browse R Packages

We want your feedback!

privefl/mypack
Analysis of Massive SNP Arrays