haplin | R Documentation |
haplin
fits a log-linear model to case-parent triads, case-control data, or combined (hybrid) case-parent control-parent triads or dyads. It estimates marker or haplotype frequencies, and uses the EM algorithm to reconstruct haplotypes and, if requested, impute missing genotypes. haplin
prints and plots estimates of relative risks associated with fetal and maternal haploypes, and in addition allows splitting fetal haplotype effects into maternally and paternally inherited effects. It allows special models, like x-inactivation, to be fitted on the X-chromosome. The result is an object of class haplin
, which can be explored with summary
, plot
, and haptable
.
haplin( data, markers = "ALL",
design = "triad", use.missing = FALSE,
xchrom = FALSE, maternal = FALSE, test.maternal = FALSE,
poo = FALSE, scoretest = "no", ccvar = NULL, strata = NULL,
sex = NULL, comb.sex = "double",
reference = "reciprocal", response = "free",
threshold = 0.01, max.haplos = NULL, haplo.file = NULL,
resampling = "no", max.EM.iter = 50, data.out = "no",
verbose = TRUE, printout = TRUE )
data |
An R-object which is the result of using genDataPreprocess. See the web page for a detailed description of how to use this function. |
markers |
Default is "ALL", which means |
design |
The value "triad" is used for the standard case triad design, without independent controls. The value "cc.triad" means a combination of case triads and control triads. This requires the argument |
use.missing |
A logical value used to determine whether triads with missing data should be included in the analysis. When set to TRUE, |
xchrom |
Logical, defaults to "FALSE". If set to "TRUE", |
maternal |
If TRUE, maternal effects are estimated as well as the standard fetal effects. |
test.maternal |
Not yet implemented. |
poo |
Parent-Of-Origin effects. If TRUE, |
scoretest |
Special interest only. If "no", no score test is computed. If "yes", an overall score p-value is included in the output, and the individual score values are returned in the |
ccvar |
Numeric. Should give the column number for the column containing the case-control indicator in the data file. Needed for the "cc" and "cc.triad" designs. The column should contain two numeric values, of which the largest one is always used to denote cases. |
strata |
Not yet implemented. |
sex |
To be used with |
comb.sex |
To be used with |
reference |
Decides how |
response |
The default value "free" means that both single- and double dose effects are estimated. Choosing "mult" instead specifies a multiplicative dose-response model. |
threshold |
Sets the (approximate) lower limit for the haplotype frequencies of those haplotypes that should be retained in the analysis. Hapotypes that are less frequent are removed, and information about this is given in the output. Default is 0.01. |
max.haplos |
Not yet implemented. |
haplo.file |
Not yet implemented. |
resampling |
Mostly for testing. Default is "no". When "no", the individual haplotypes reconstructed by the EM algorithm as assumed known when computing CIs and p-values. If set to "jackknife" a jackknife-based resampling procedure is used when computing confidence intervals and p-values for effect estimates. This takes more time, but corrects the CIs and p-values for the uncertainty contained in unphased data. Note: in all recent versions of |
max.EM.iter |
The maximum number of iterations used by the EM algorithm. This value can be increased if necessary, which sometimes is the case with e.g. case-control data which a substantial amount of missing. However, for triad data with little missing information there is usually no need for many iterations. |
data.out |
Character. Accepts values "no", "prelim", "null" or "full", with "no" as default. For values other than default, |
verbose |
Default is T (=TRUE). During the EM algorithm, |
printout |
Logical. If TRUE (default), |
Input data can be either a haplin
format data file, or a PED data. These have to be loaded into R first, using genDataRead or genDataLoad functions, and then pre-processed with the genDataPreprocess function. If the PED data file is used, the arguments filename
, n.vars
, sep
, allele.sep
, na.strings
, ccvar
, and sex
need not be specified.
The output can be examined by print
, summary
, plot
and haptable
.
An object of class haplin
is returned.
(The only exception is when data.out
is set different from "no", where haplin
will produce a data file with haplotypes identified.)
Typically, some of the included haplotypes will be relatively rare, such as a frequency of 1% - 5%. For those haplotypes there may be too little data to estimate the double doses properly, so the estimates may be unreliable. This is seen from the extremely wide confidence intervals. The rare double dose estimates should be disregarded, but the remaining single and double dose estimates are valid. To avoid the problem one can also reduce the model to a purely multiplicative model by setting response = "mult"
combined with reference = "ref.cat"
.
Further information is found on the web page.
Hakon K. Gjessing
Professor of Biostatistics
Division of Epidemiology
Norwegian Institute of Public Health
hakon.gjessing@uib.no
Gjessing HK and Lie RT. Case-parent triads: Estimating single- and double-dose effects of fetal and maternal disease gene haplotypes. Annals of Human Genetics (2006) 70, pp. 382-396.
Web Site: https://haplin.bitbucket.io
summary.haplin
, plot.haplin
, pedToHaplin
, haptable
, haplinSlide
, genDataLoad
, genDataRead
, genDataPreprocess
# setting up the directory with exemplary data
dir.in <- system.file( "extdata", package = "Haplin" )
file.in <- file.path( dir.in, "data.dat" )
# reading data in
data.in <- genDataRead( file.in, file.out = "poo_exmpl_data_read", format = "haplin",
dir.out = tempdir( check = TRUE ), n.vars = 1, allele.sep = " ", col.sep = " ",
overwrite = TRUE )
# preprocessing the data
data.preproc <- genDataPreprocess( data.in, design = "triad",
file.out = "poo_exmpl_data_preproc", dir.out = tempdir( check = TRUE ), overwrite = TRUE )
# running haplin, calculating POO
res.POO <- haplin( data.preproc, markers = 2, poo = TRUE, response = "mult",
reference = 2, use.missing = TRUE )
res.POO
## Not run:
# 1. Read the data:
my.haplin.data <- genDataRead( file.in = "HAPLIN.trialdata.txt", file.out =
"trial_data1", dir.out = ".", format = "haplin", n.vars = 0 )
# 2. Run pre-processing:
haplin.data.prep <- genDataPreprocess( data.in = my.haplin.data, format =
"haplin", design = "triad", file.out = "trial_data1_prep", dir.out = "." )
# 3. Analyze:
# Standard run:
haplin( haplin.data.prep )
# Specify path, estimate maternal effects:
haplin( haplin.data.prep, maternal = T )
# Specify path, use haplotype no. 2 as reference:
haplin( haplin.data.prep, reference = 2 )
# Remove more haplotypes from estimation by increasing the threshold
# to 5%:
haplin( haplin.data.prep, threshold = 0.05 )
# Estimate maternal effects, using the most frequent haplotype as reference.
# Use all data, including triads with missing data. Select
# markers 3, 4 and 8 from the supplied data.
haplin( haplin.data.prep, use.missing = T, maternal = T,
reference = "ref.cat", markers = c(3,4,8) )
# Note: in this version of haplin, the jackknife is
# no longer necessary since the standard errors are already corrected.
# Some examples showing how to save the haplin result and later
# recall plot and summary results:
# Same analysis as above, saving the result in the object "result.1":
result.1 <- haplin( haplin.data.prep, use.missing = T, maternal = T,
reference = "ref.cat", markers = c(3,4,8) )
# Replot the saved result (fetal effects):
plot( result.1 )
# Replot the saved result (maternal effects):
plot( result.1, plot.maternal = T )
# Print a very short summary of saved result:
result.1
# A full summary of saved result, with confidence intervals and
# p-values (the same as haplin prints when running):
summary( result.1 )
# Some examples when the data file contains two covariates,
# the second is the case-control variable:
# The following standard triad run is INCORRECT since it disregards
# case status:
haplin("data.dat", use.missing = T, n.vars = 2, design = "triad")
# Combined run on "hybrid" design, correctly using both case-parent
# triads and control-parent triads:
haplin( my.haplin.data, use.missing = T, n.vars = 2, ccvar = 2,
design = "cc.triad" )
# If parent columns are not in the file, a plain case-control
# run can be used:
haplin( my.haplin.data, use.missing = T, n.vars = 2, ccvar = 2,
design = "cc", response = "mult", reference = "ref.cat" )
# An example of how to produce a data file with all possible haplotypes
# identified for each triad, together with their probaility weights:
result.data <- haplin( my.haplin.data, use.missing = T,
markers = c(3,4,8), data.out = "prelim" )
# result.data will then contain the data file, with a vector of
# probabilities (freq) computed from the preliminary haplotype
# frequencies.
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.