knitr::opts_chunk$set( echo = TRUE ) library( Haplin, quietly = TRUE )
By default, Haplin estimates the relative risk (RR) of a phenotype associated with a haplotype, based on triad or dyad genotype data. The output of the genDataPreprocess
function (or genDataLoad
) is used to run the analysis. Haplin analysis is run by the single command:
haplin( my.prepared.gen.data )
CAUTION: this command will try to provide estimates based on all the markers in the data object! Therefore, if you have a large dataset, such as from GWAS analysis, first try running a scan over the markers with a small window size, to determine where to focus your subsequent more detailed analysis:
haplinSlide( my.prepared.gen.data, use.missing = TRUE, winlength = 3 )
This performs haplin analysis on the marker window of length given by the winlegth
argument above in order to search for the most significant regions in the dataset.
For more options and examples of how to run Haplin, see below or the haplin help file, obtained by writing in R:
?haplin
and
?haplinSlide
To test that Haplin runs properly, you can use the exemplary data provided with Haplin.
Here, the data includes only genotypes and the analysis is performed on all markers:
unlink( c( "*.ffData", "*.RData" ) )
dir.exmpl <- system.file( "extdata", package = "Haplin" ) exemplary.file1 <- paste0( dir.exmpl, "/HAPLIN.trialdata.txt" ) trial.data1 <- genDataRead( file.in = exemplary.file1, file.out = "trial_data1", dir.out = ".", format = "haplin", n.vars = 0 ) trial.data1.prep <- genDataPreprocess( data.in = trial.data1, design = "triad", file.out = "trial_data1_prep", dir.out = "." ) haplin( trial.data1.prep, use.missing = TRUE, maternal = TRUE )
First, the information about the calculation process is printed:
use.missing = F
);Next, the default is to print the summary of the results:
As you can see, a lot of information is printed out by default. One can change it by setting options verbose
and/or printout
to FALSE
. Moreover, it's usually quite useful to save the results into an object:
my.results <- haplin( trial.data1.prep, use.missing = TRUE, maternal = TRUE, verbose = FALSE, printout = FALSE ) my.results
To access the details of the results, we can use the following functions with the saved object my.results
as the argument:
summary
, which outputs all the results in a nicely formatted text (as outputted above) or,haptable
, which gives the same data only in a matrix format,plot
, which replots the figure with the results,output
, which saves the results to text and JPG files.This example shows analysis of data where apart from genotypes, there are two covariate columns (n.vars = 2
), one coding for the case-control status (ccvar = 2
). In the analysis, we take into account all the available data, imputing the parts that are missing (use.missing = TRUE
). The estimation will be based on the dose-response model (response = "mult"
).
exemplary.file2 <- paste0( dir.exmpl, "/HAPLIN.trialdata2.txt" ) trial.data2 <- genDataRead( file.in = exemplary.file2, file.out = "trial_data2", dir.out = ".", format = "haplin", allele.sep = "", n.vars = 2 ) trial.data2.prep <- genDataPreprocess( data.in = trial.data2, design = "triad", file.out = "trial_data2_prep", dir.out = "." ) haplin( trial.data2.prep, use.missing = TRUE, ccvar = 2, design = "cc.triad", reference = "ref.cat", response = "mult" )
This is very similar to running haplin
, but the results are a list of haptable. This type of analysis is usually performed when we have much bigger data. Let's read a proper .ped file:
exemplary.file3 <- paste0( dir.exmpl, "/exmpl_data.ped" ) hapSlide.data <- genDataRead( exemplary.file3, file.out = "hapSlide_data", format = "ped" ) hapSlide.data
The preprocessing of this file would take a while. Let's focus then on first 100 markers:
hapSlide.subset.data <- genDataGetPart( hapSlide.data, design = "cc", markers = 1:100, file.out = "hapSlide_subset_data" ) hapSlide.subset.data.prep <- genDataPreprocess( hapSlide.subset.data, design = "cc.triad", file.out = "hapSlide_subset_prep" )
We run the haplinSlide analysis:
hapSlide.res1 <- haplinSlide( hapSlide.subset.data.prep, use.missing = TRUE, ccvar = 10, design = "cc.triad", reference = "ref.cat", response = "mult" )
Here, the output is stored as a list of tables, where each table contains results of the haplin
run on a certain marker or a set of markers, called a window. The length of the window can be controlled by the winlength
argument (see the haplinSlide help for details). Here, the default window length was used, 1 marker.
Due to the size of haplinSlide
results, it is best to narrow down the visualization of the results. If you have installed the ggplot2 package, you can use the functions plotPValues
and plot.haplinSlide
. First, we advise to plot only the p-values of the estimation we're interested in, using the plotPValues
function:
all.p.values <- plotPValues( hapSlide.res1 )
By default, all the windows are plotted, but we can choose to plot results for the windows only with p-values below a certain threshold:
all.p.values <- plotPValues( hapSlide.res1, plot.signif.only = TRUE, signif.thresh = 0.25 )
The object all.p.values
holds the table with p-values for all the windows plotted.
head( all.p.values )
If we ran analysis with estimation of the maternal or parent-of-origin (PoO) effects, then we can choose to plot these p-values:
hapSlide.res2 <- haplinSlide( hapSlide.subset.data.prep, use.missing = TRUE, ccvar = 10, design = "cc.triad", poo = TRUE, reference = "ref.cat", response = "mult" ) all.p.values2 <- plotPValues( hapSlide.res2, which.p.val = "poo", plot.signif.only = TRUE, signif.thresh = 0.2 ) head( all.p.values2 )
NOTE: here, the star
column marks the significant p-values:
*
marks p-value $< 0.05$,**
marks p-value $< 0.01$,***
marks p-value $< 0.001$.And then, we can plot the relative risks:
plot( hapSlide.res2, plot.signif.only = TRUE )
Here's also how the results look like when we estimate the maternal effects, with the window length set to two:
hapSlide.res3 <- haplinSlide( hapSlide.subset.data.prep, use.missing = TRUE, ccvar = 10, design = "cc.triad", maternal = TRUE, reference = "ref.cat", response = "mult", winlength = 2 ) plot( hapSlide.res3, plot.signif.only = TRUE, signif.thresh = 0.01 )
If our dataset contains a column with environmental exposure of (usually) the mother, we can analyse it with haplinStrat
, that calculates relative risks for strata defined by the categories of the environmental exposure variable. To do that, we use the strata
argument which points to the column in the dataset that contains the environmental exposure categories.
hapStrat.res <- haplinStrat( data = trial.data2.prep, strata = 1, use.missing = TRUE, ccvar = 2, design = "cc.triad", reference = "ref.cat", response = "mult" )
The result of haplinStrat
run is a list of haplin objects: one for the haplin run on the entire data, and then one for every stratum.
lapply( hapStrat.res, haptable )
We can plot the results easily with the plot
function (provided that the ggplot2 package is installed on your system!):
plot( hapStrat.res )
To check whether there is any significant interaction between the environmental exposure and genotypes, we use the gxe
function:
gxe( hapStrat.res )
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.