Introduction

This package facilitates

Multifactorial dichotomous disorders, i.e. disorders characterised by two states - affected v unaffected, are modelled using the liability threshold model. All the following features can be incorporated in combination:

The package allows application of such models to disease pedigrees. Pedigree related outputs are

The disease model allows evidence available from personal attributes and family history to be appropriately weighed in construction of the disease pedigree's liability distribution. Predictions of lifetime and n year risk are calculated, per pedigree member, from this distribution.

The package is the engine behind a website

The website allows

Example disease models and pedigrees are provided at the website and are used in accompanying tutorials to illustrate the features available.

We focus on dichotomous multifactorial disease. Models for single-gene (Mendelian) diseases can be constructed (using major genetic risk locii as categorical risk factors). We cannot vouch for the appropriateness of such models as our expertise is in multifactorial disease. Disorders/traits of a continuous nature (e.g. blood pressure) are not handled.

Example

The package allows the prediction of disease risk for the individuals of a family (pedigree), based on a disease model. This example demonstrates the typically use of the package. We will

  1. create a disease model
  2. look at a disease pedigree
  3. predict disease risk for the pedigree members based on the disease model

Creating a Disease model

Epidemiology

We consider a model for Major Depression, which illustrates many of the features available. Disease lifetime risk is 15%, liability follows an AE model with heritability of 66%. The disease risk for females is twice that of males. Of the 15% destined to suffer the disease, 80% will have done so by the age of 44. Hopefully, how these features are represented in the disease model object shown below is obvious.

Somewhat counter-intuitively, gender is a component of non-shared environmental effect. This is correct; my sex does not predict that of my sibling. Sex is genetically determined, but whether we inherit an X or Y paternal chromosome, is a random event. This is represented in the covMatrixAsq, covMatrixCsq and covMatrixEsq elements of the disease model object.

The following code reports the baseline disease model object we will use.

suppressPackageStartupMessages(library(diseaseRiskPredictor))

# copy depression disease model into current environment
lDis <- diseaseRiskPredictor::lDisDepression

# report the depression disease model
lDis

Building a model

So far the disease model object just contains epidemiological data. In the next bit of code that epidemiological data is used to build a liability threshold disease model. Several items get added to the disease model object and the categoricalRiskFactors table is extended.

# calculate disease model
lDis <- fnCalcDiseaseModel( lDis )
names(lDis)

Disease Pedigree

The linkage/ped file format is a well known format for specifying family trees. Each row pertains to one pedigree member. We represent disease pedigrees in a data.frame with a similar structure. The following code reports a three generation pedigree with one affected member.

# copy 3 generation pedigree into current environment
dfPed <- diseaseRiskPredictor::dfPed3Gen1Aff
dfPed

The ped file format is hard to read, this makes pedigree mis-specification hard to detect. To help guard against that, we provide functionality allowing

The next bit of code performs various checks on the validity of the pedigree.

# validate pedigree (also performs type conversion on some columns, hence the returned object)
dfPed <- fnValidatePedigree( dfPed )

The next bit of code generates the corresponding pedigree diagram.

# construct pedigree object
oPedigree <- fnConstructPedigree( dfPed )

# plot pedigree diagram
plot(oPedigree)

Calculating the Pedigree Members' Risks

Now let's apply the Major Depression disease model to the 3 generation pedigree. The following code does that.

lPedDis <- fnPrepareForRiskPrediction( dfPed, lDis )

# calculate pedigree members' risks of depression
lRes <- fnPredictRisk( lPedDis )

# report contents of the results object
names(lRes)

Risk

The results object returned is a named list. One of its elements (dfPedRisk) contains pedigree members' predicted risk. The following code reports and also plots the pedigree members' risks.

# report pedigree members' risks
lRes$dfPedRisk[,c(2,3,5:7,11:12)]

# plot pedigree members' risks
fnBarPlotRisk( lRes$dfPedRisk )

Looking at the above barplot some features are apparent. There is a sex difference in risk, with females more at risk. For instance, siblings 21 and 22 are identical in all relevant variables except for sex. Ditto, siblings 31 and 32.

The effect of age on risk prediction is also apparent. Persons 11, 21 and 31 are all unaffected 1st degree male relatives of the affected person, their only (relevant) difference is in age. This dramatically influenced their disease risk. Ditto, females 12, 22 and 32.

Person 24's affection status is not known. Her only genetic relations in the pedigree are persons 31 and 32. Both are unaffected, but that status is uninformative due to their age. Consequently there is no pedigree information informing on her risk. Her risk should then be the population lifetime risk for females. That is indeed the case. The following code reports

lRes$lDis$categoricalRiskFactors[,1:7]

N Year Risk

The dfPedRisk object also contains the pedigree members' 5 year risks, as reported by the following code

# report pedigree members' risks
lRes$dfPedRisk[,c(2,3,5:7,13,18)]

The nYearRisk column gives risk of developing the condition in the next n years, the nofYears column specifies n. One can see comparing the 2nd and 3rd generations, that the 2nd generation have greater 5 year risk than the 3rd generation (for risks this is reversed). This makes sense given the age of onset curve (in the disease epidemiological findings above).

See Also

More complex disease models and pedigrees can be specified. Disease models and pedigrees can be saved/loaded to/from human readable and editable text files. A website built around this package allows interactive

It provides example disease models and pedigrees and tutorials illustrating the features available. It is a lot easier to use than the R command line, so rather than try to explain everything here, I recommend it to interested parties

Installation

If you want to use R package diseaseRiskPredictor (as opposed to just using the website) you will have to install it. We only make available the source package (as opposed to binary package(s)). Because of this, you will have to build the package from the source. Although mostly written in R, the package also contains some C++ functions. These will need to be compiled and linked into the package.

Install R package development tools

To compile and link in the C++ code you will need a set of R package development tools

Check tools installation

Enter the following at the R console prompt. The install.packages command will prompt you to choose a repository to install from, it doesn't matter which one you choose.

# install it
install.packages("Rcpp")

# attach it
library(Rcpp)

# test c++ code for adding 1 plus 1
evalCpp('1+1',showOutput=1,rebuild=1) 

After about a minute, you should get a few lines of output with the last line giving the answer 2. That proves the tools installation worked.

On Windows you can check tools installation in another way. Enter the following at the R console prompt.

# install it
install.packages("devtools")

# attach it
library(devtools)

# test for Rtools installation
find_rtools()

You should get the answer TRUE. I don't know the equivalent commands for Mac or Linux.

Build and Install

So now you should have the ability to build source R packages. The R package devtools provides helper functions for doing this, and also for installing from non-CRAN locations. We are using github to host our package. We will use devtools to download and install it from there. Enter the following at the R prompt

install.packages("devtools")

devtools::install_github("DesmondCampbell/diseaseRiskPredictor")

To use diseaseRiskPredictor

To use diseaseRiskPredictor, attach it via the library command. There is currently a bug in the kinship2::plot.pedigree function which prevents it from plotting some valid pedigrees. Run the following command to replace the kinship2 function with a patch function which fixes the bug.

# attach it
library("diseaseRiskPredictor")

assignInNamespace( "plot.pedigree", diseaseRiskPredictor:::plot.pedigree.FIXED, ns="kinship2")

That's it.

Methodology

The methodology used in the construction of the disease models is detailed in our paper, Campbell et al. 2017.

Application of the disease model to pedigrees is based on methodology explained in Campbell et al. 2010.

A lot of the pedigree related functionality leans heavily on the R package kinship2 (Sinnwell et al. 2014).

Contacts

Feel free to contact us via

References

Campbell et al. 2017. Multifactorial Disease Risk Calculator: Web-based risk prediction for multifactorial disease pedigrees. Genetic Epidemiology.

Campbell DD, Sham PC, Knight J, Wickham H, and Landau S. 2010. Software for Generating Liability Distributions for Pedigrees Conditional on Their Observed Disease States and Covariates. Genetic Epidemiology 34 (2): 159–70. doi:10.1002/gepi.20446.

Sinnwell JP, Therneau TM, and Schaid DJ. 2014. ‘The kinship2 R Package for Pedigree Data’. Human Heredity 78 (2): 91–93. doi:10.1159/000363105.



DesmondCampbell/diseaseRiskPredictor documentation built on May 14, 2019, 8:40 a.m.