Running Haplin analysis
In Haplin: Analyzing Case-Parent Triad and/or Case-Control Data with SNP Haplotypes

knitr::opts_chunk$set( echo = TRUE )
library( Haplin, quietly = TRUE )

Running the analysis

By default, Haplin estimates the relative risk (RR) of a phenotype associated with a haplotype, based on triad or dyad genotype data. The output of the genDataPreprocess function (or genDataLoad) is used to run the analysis. Haplin analysis is run by the single command:

haplin( my.prepared.gen.data )

CAUTION: this command will try to provide estimates based on all the markers in the data object! Therefore, if you have a large dataset, such as from GWAS analysis, first try running a scan over the markers with a small window size, to determine where to focus your subsequent more detailed analysis:

haplinSlide( my.prepared.gen.data, use.missing = TRUE, winlength = 3 )

This performs haplin analysis on the marker window of length given by the winlegth argument above in order to search for the most significant regions in the dataset.

For more options and examples of how to run Haplin, see below or the haplin help file, obtained by writing in R:

?haplin

and

?haplinSlide

Exemplary Haplin runs

To test that Haplin runs properly, you can use the exemplary data provided with Haplin.

Trial run no.1

Here, the data includes only genotypes and the analysis is performed on all markers:

unlink( c( "*.ffData", "*.RData" ) )

dir.exmpl <- system.file( "extdata", package = "Haplin" )
exemplary.file1 <- paste0( dir.exmpl, "/HAPLIN.trialdata.txt" )

trial.data1 <- genDataRead( file.in = exemplary.file1, file.out = "trial_data1",
    dir.out = ".", format = "haplin", n.vars = 0 )
trial.data1.prep <- genDataPreprocess( data.in = trial.data1, design = "triad",
  file.out = "trial_data1_prep", dir.out = "." )
haplin( trial.data1.prep, use.missing = TRUE, maternal = TRUE )

First, the information about the calculation process is printed:

how many rows contained missing data and how many were dropped (if use.missing = F);
information about any other reasons to drop data, e.g., Mandelian inconsistencies;
process of EM estimation - for each iteration, the deviance and values of the coefficients of the model; NB: there should not be many iterations, which would indicate some problems with the chosen model or with the input data

Next, the default is to print the summary of the results:

the full information about the input data and arguments;
summary of all the data parts dropped due to some inconsistencies;
the frequency of alleles in the markers and the results of testing for Hardy-Weinberg equilibrium (HWE);
summary of the haplotypes found;
and finally, the model estimations with relative risks for each of the haplotypes.

As you can see, a lot of information is printed out by default. One can change it by setting options verbose and/or printout to FALSE. Moreover, it's usually quite useful to save the results into an object:

my.results <- haplin( trial.data1.prep, use.missing = TRUE, maternal = TRUE,
  verbose = FALSE, printout = FALSE )
my.results

To access the details of the results, we can use the following functions with the saved object my.results as the argument:

summary, which outputs all the results in a nicely formatted text (as outputted above) or,
haptable, which gives the same data only in a matrix format,
plot, which replots the figure with the results,
output, which saves the results to text and JPG files.

Trial run no.2

This example shows analysis of data where apart from genotypes, there are two covariate columns (n.vars = 2), one coding for the case-control status (ccvar = 2). In the analysis, we take into account all the available data, imputing the parts that are missing (use.missing = TRUE). The estimation will be based on the dose-response model (response = "mult").

exemplary.file2 <- paste0( dir.exmpl, "/HAPLIN.trialdata2.txt" )

trial.data2 <- genDataRead( file.in = exemplary.file2, file.out = "trial_data2",
    dir.out = ".", format = "haplin", allele.sep = "", n.vars = 2 )
trial.data2.prep <- genDataPreprocess( data.in = trial.data2, design = "triad",
  file.out = "trial_data2_prep", dir.out = "." )
haplin( trial.data2.prep, use.missing = TRUE, ccvar = 2, design =
  "cc.triad", reference = "ref.cat", response = "mult" )

Running analysis on GWAS data: HaplinSlide

This is very similar to running haplin, but the results are a list of haptable. This type of analysis is usually performed when we have much bigger data. Let's read a proper .ped file:

exemplary.file3 <- paste0( dir.exmpl, "/exmpl_data.ped" )

hapSlide.data <- genDataRead( exemplary.file3, file.out = "hapSlide_data",
    format = "ped" )
hapSlide.data

The preprocessing of this file would take a while. Let's focus then on first 100 markers:

hapSlide.subset.data <- genDataGetPart( hapSlide.data, design = "cc",
    markers = 1:100, file.out = "hapSlide_subset_data" )
hapSlide.subset.data.prep <- genDataPreprocess( hapSlide.subset.data,
    design = "cc.triad", file.out = "hapSlide_subset_prep" )

We run the haplinSlide analysis:

hapSlide.res1 <- haplinSlide( hapSlide.subset.data.prep, use.missing = TRUE,
    ccvar = 10, design = "cc.triad", reference = "ref.cat", response = "mult" )

Here, the output is stored as a list of tables, where each table contains results of the haplin run on a certain marker or a set of markers, called a window. The length of the window can be controlled by the winlength argument (see the haplinSlide help for details). Here, the default window length was used, 1 marker.

Investigating and plotting results

Due to the size of haplinSlide results, it is best to narrow down the visualization of the results. If you have installed the ggplot2 package, you can use the functions plotPValues and plot.haplinSlide. First, we advise to plot only the p-values of the estimation we're interested in, using the plotPValues function:

all.p.values <- plotPValues( hapSlide.res1 )

By default, all the windows are plotted, but we can choose to plot results for the windows only with p-values below a certain threshold:

all.p.values <- plotPValues( hapSlide.res1, plot.signif.only = TRUE,
    signif.thresh = 0.25 )

The object all.p.values holds the table with p-values for all the windows plotted.

head( all.p.values )

Special effects: maternal or parent-of-origin

If we ran analysis with estimation of the maternal or parent-of-origin (PoO) effects, then we can choose to plot these p-values:

hapSlide.res2 <- haplinSlide( hapSlide.subset.data.prep, use.missing = TRUE,
  ccvar = 10, design = "cc.triad", poo = TRUE, reference = "ref.cat",
  response = "mult" )
all.p.values2 <- plotPValues( hapSlide.res2, which.p.val = "poo",
    plot.signif.only = TRUE, signif.thresh = 0.2 )
head( all.p.values2 )

NOTE: here, the star column marks the significant p-values:

* marks p-value $< 0.05$,
** marks p-value $< 0.01$,
*** marks p-value $< 0.001$.

And then, we can plot the relative risks:

plot( hapSlide.res2, plot.signif.only = TRUE )

Here's also how the results look like when we estimate the maternal effects, with the window length set to two:

hapSlide.res3 <- haplinSlide( hapSlide.subset.data.prep, use.missing = TRUE,
  ccvar = 10, design = "cc.triad", maternal = TRUE, reference = "ref.cat",
  response = "mult", winlength = 2 )
plot( hapSlide.res3, plot.signif.only = TRUE, signif.thresh = 0.01 )

Analysis of gene-environment interactions (GxE) with haplinStrat

If our dataset contains a column with environmental exposure of (usually) the mother, we can analyse it with haplinStrat, that calculates relative risks for strata defined by the categories of the environmental exposure variable. To do that, we use the strata argument which points to the column in the dataset that contains the environmental exposure categories.

hapStrat.res <- haplinStrat( data = trial.data2.prep, strata = 1, use.missing = TRUE,
  ccvar = 2, design = "cc.triad", reference = "ref.cat", response = "mult" )

The result of haplinStrat run is a list of haplin objects: one for the haplin run on the entire data, and then one for every stratum.

lapply( hapStrat.res, haptable )

Plotting the results

We can plot the results easily with the plot function (provided that the ggplot2 package is installed on your system!):

plot( hapStrat.res )

Checking for significance of interactions

To check whether there is any significant interaction between the environmental exposure and genotypes, we use the gxe function:

gxe( hapStrat.res )

Any scripts or data that you put into this service are public.

Haplin documentation built on Sept. 11, 2024, 7:13 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

Haplin
Analyzing Case-Parent Triad and/or Case-Control Data with SNP Haplotypes

Running Haplin analysis
In Haplin: Analyzing Case-Parent Triad and/or Case-Control Data with SNP Haplotypes

Running the analysis

Exemplary Haplin runs

Trial run no.1

Trial run no.2

Running analysis on GWAS data: HaplinSlide

Investigating and plotting results

Special effects: maternal or parent-of-origin

Analysis of gene-environment interactions (GxE) with haplinStrat

Plotting the results

Checking for significance of interactions

Try the Haplin package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

Haplin Analyzing Case-Parent Triad and/or Case-Control Data with SNP Haplotypes

Running Haplin analysis In Haplin: Analyzing Case-Parent Triad and/or Case-Control Data with SNP Haplotypes

Running the analysis

Exemplary Haplin runs

Trial run no.1

Trial run no.2

Running analysis on GWAS data: HaplinSlide

Investigating and plotting results

Special effects: maternal or parent-of-origin

Analysis of gene-environment interactions (GxE) with haplinStrat

Plotting the results

Checking for significance of interactions

Try the Haplin package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

Haplin
Analyzing Case-Parent Triad and/or Case-Control Data with SNP Haplotypes

Running Haplin analysis
In Haplin: Analyzing Case-Parent Triad and/or Case-Control Data with SNP Haplotypes