Running Haplin analysis

knitr::opts_chunk$set( echo = TRUE )
library( Haplin, quietly = TRUE )

Running the analysis

By default, Haplin estimates the relative risk (RR) of a phenotype associated with a haplotype, based on triad or dyad genotype data. The output of the genDataPreprocess function (or genDataLoad) is used to run the analysis. Haplin analysis is run by the single command:

haplin( my.prepared.gen.data )

CAUTION: this command will try to provide estimates based on all the markers in the data object! Therefore, if you have a large dataset, such as from GWAS analysis, first try running a scan over the markers with a small window size, to determine where to focus your subsequent more detailed analysis:

haplinSlide( my.prepared.gen.data, use.missing = TRUE, winlength = 3 )

This performs haplin analysis on the marker window of length given by the winlegth argument above in order to search for the most significant regions in the dataset.

For more options and examples of how to run Haplin, see below or the haplin help file, obtained by writing in R:

?haplin

and

?haplinSlide

Exemplary Haplin runs

To test that Haplin runs properly, you can use the exemplary data provided with Haplin.

Trial run no.1

Here, the data includes only genotypes and the analysis is performed on all markers:

unlink( c( "*.ffData", "*.RData" ) )
dir.exmpl <- system.file( "extdata", package = "Haplin" )
exemplary.file1 <- paste0( dir.exmpl, "/HAPLIN.trialdata.txt" )

trial.data1 <- genDataRead( file.in = exemplary.file1, file.out = "trial_data1",
    dir.out = ".", format = "haplin", n.vars = 0 )
trial.data1.prep <- genDataPreprocess( data.in = trial.data1, design = "triad",
  file.out = "trial_data1_prep", dir.out = "." )
haplin( trial.data1.prep, use.missing = TRUE, maternal = TRUE )

First, the information about the calculation process is printed:

Next, the default is to print the summary of the results:

As you can see, a lot of information is printed out by default. One can change it by setting options verbose and/or printout to FALSE. Moreover, it's usually quite useful to save the results into an object:

my.results <- haplin( trial.data1.prep, use.missing = TRUE, maternal = TRUE,
  verbose = FALSE, printout = FALSE )
my.results

To access the details of the results, we can use the following functions with the saved object my.results as the argument:


Trial run no.2

This example shows analysis of data where apart from genotypes, there are two covariate columns (n.vars = 2), one coding for the case-control status (ccvar = 2). In the analysis, we take into account all the available data, imputing the parts that are missing (use.missing = TRUE). The estimation will be based on the dose-response model (response = "mult").

exemplary.file2 <- paste0( dir.exmpl, "/HAPLIN.trialdata2.txt" )

trial.data2 <- genDataRead( file.in = exemplary.file2, file.out = "trial_data2",
    dir.out = ".", format = "haplin", allele.sep = "", n.vars = 2 )
trial.data2.prep <- genDataPreprocess( data.in = trial.data2, design = "triad",
  file.out = "trial_data2_prep", dir.out = "." )
haplin( trial.data2.prep, use.missing = TRUE, ccvar = 2, design =
  "cc.triad", reference = "ref.cat", response = "mult" )

Running analysis on GWAS data: HaplinSlide

This is very similar to running haplin, but the results are a list of haptable. This type of analysis is usually performed when we have much bigger data. Let's read a proper .ped file:

exemplary.file3 <- paste0( dir.exmpl, "/exmpl_data.ped" )

hapSlide.data <- genDataRead( exemplary.file3, file.out = "hapSlide_data",
    format = "ped" )
hapSlide.data

The preprocessing of this file would take a while. Let's focus then on first 100 markers:

hapSlide.subset.data <- genDataGetPart( hapSlide.data, design = "cc",
    markers = 1:100, file.out = "hapSlide_subset_data" )
hapSlide.subset.data.prep <- genDataPreprocess( hapSlide.subset.data,
    design = "cc.triad", file.out = "hapSlide_subset_prep" )

We run the haplinSlide analysis:

hapSlide.res1 <- haplinSlide( hapSlide.subset.data.prep, use.missing = TRUE,
    ccvar = 10, design = "cc.triad", reference = "ref.cat", response = "mult" )

Here, the output is stored as a list of tables, where each table contains results of the haplin run on a certain marker or a set of markers, called a window. The length of the window can be controlled by the winlength argument (see the haplinSlide help for details). Here, the default window length was used, 1 marker.

Investigating and plotting results

Due to the size of haplinSlide results, it is best to narrow down the visualization of the results. If you have installed the ggplot2 package, you can use the functions plotPValues and plot.haplinSlide. First, we advise to plot only the p-values of the estimation we're interested in, using the plotPValues function:

all.p.values <- plotPValues( hapSlide.res1 )

By default, all the windows are plotted, but we can choose to plot results for the windows only with p-values below a certain threshold:

all.p.values <- plotPValues( hapSlide.res1, plot.signif.only = TRUE,
    signif.thresh = 0.25 )

The object all.p.values holds the table with p-values for all the windows plotted.

head( all.p.values )

Special effects: maternal or parent-of-origin

If we ran analysis with estimation of the maternal or parent-of-origin (PoO) effects, then we can choose to plot these p-values:

hapSlide.res2 <- haplinSlide( hapSlide.subset.data.prep, use.missing = TRUE,
  ccvar = 10, design = "cc.triad", poo = TRUE, reference = "ref.cat",
  response = "mult" )
all.p.values2 <- plotPValues( hapSlide.res2, which.p.val = "poo",
    plot.signif.only = TRUE, signif.thresh = 0.2 )
head( all.p.values2 )

NOTE: here, the star column marks the significant p-values:

And then, we can plot the relative risks:

plot( hapSlide.res2, plot.signif.only = TRUE )

Here's also how the results look like when we estimate the maternal effects, with the window length set to two:

hapSlide.res3 <- haplinSlide( hapSlide.subset.data.prep, use.missing = TRUE,
  ccvar = 10, design = "cc.triad", maternal = TRUE, reference = "ref.cat",
  response = "mult", winlength = 2 )
plot( hapSlide.res3, plot.signif.only = TRUE, signif.thresh = 0.01 )

Analysis of gene-environment interactions (GxE) with haplinStrat

If our dataset contains a column with environmental exposure of (usually) the mother, we can analyse it with haplinStrat, that calculates relative risks for strata defined by the categories of the environmental exposure variable. To do that, we use the strata argument which points to the column in the dataset that contains the environmental exposure categories.

hapStrat.res <- haplinStrat( data = trial.data2.prep, strata = 1, use.missing = TRUE,
  ccvar = 2, design = "cc.triad", reference = "ref.cat", response = "mult" )

The result of haplinStrat run is a list of haplin objects: one for the haplin run on the entire data, and then one for every stratum.

lapply( hapStrat.res, haptable )

Plotting the results

We can plot the results easily with the plot function (provided that the ggplot2 package is installed on your system!):

plot( hapStrat.res )

Checking for significance of interactions

To check whether there is any significant interaction between the environmental exposure and genotypes, we use the gxe function:

gxe( hapStrat.res )


Try the Haplin package in your browser

Any scripts or data that you put into this service are public.

Haplin documentation built on May 20, 2022, 5:07 p.m.