GenABEL: an R package for Genome Wide Association Analysis


Genome-wide association (GWA) analysis is a tool of choice for identification of genes for complex traits. Effective storage, handling and analysis of GWA data represent a challenge to modern computational genetics. GWA studies generate large amount of data: hundreds of thousands of single nucleotide polymorphisms (SNPs) are genotyped in hundreds or thousands of patients and controls. Data on each SNP undergoes several types of analysis: characterization of frequency distribution, testing of Hardy-Weinberg equilibrium, analysis of association between single SNPs and haplotypes and different traits, and so on. Because SNP genotypes in dense marker sets are correlated, significance testing in GWA analysis is preferably performed using computationally intensive permutation test procedures, further increasing the computational burden.

To make GWA analysis possible on standard desktop computers we developed GenABEL library which addresses the following objectives:

(1) Minimization of the amount of rapid access memory (RAM) used and the time required for data transactions. For this, we developed an effective data storage and manipulation model.

(2) Maximization of the throughput of GWA analysis. For this, we designed optimal fast procedures for specific genetic tests.

Embedding GenABEL into R environment allows for easy data characterization, exploration and presentation of the results and gives access to a wide range of standard and special statistical analysis functions available in base R and specific R packages, such as "haplo.stats", "genetics", etc.

The most important functions and classes are:

For converting data from other formats, see

convert.snp.illumina (Illumina/Affymetrix-like format). This is our preferred converting function, very extensively tested. Other conversion functions include: convert.snp.text (conversion from human-readable GenABEL format), convert.snp.ped (Linkage, Merlin, Mach, and similar files), convert.snp.mach (Mach-format), convert.snp.tped (from PLINK TPED format), convert.snp.affymetrix (BRML-style files).

For converting of GenABEL's data to other formats, see export.merlin (MERLIN and MACH formats), export.impute (IMPUTE, SNPTEST and CHIAMO formats), export.plink (PLINK format, also exports phenotypic data).

To load the data, see

For conversion to DatABEL format (used by ProbABEL and some other GenABEL suite packages), see impute2databel, impute2mach, mach2databel.

For data managment and manipulations see,,,, snp.names, snp.subset.

For merging extra data to the phenotypic part of object, see add.phdata.

For traits manipulations see ztransform (transformation to standard Normal), rntransform (rank-transformation to normality), npsubtreated (non-parametric routine to "impute" trait's values in these medicated).

For quality control, see check.trait, check.marker,,, perid.summary, ibs, hom.

For fast analysis function, see scan.gwaa-class, ccfast, qtscore, mmscore, egscore, ibs, r2fast (estimate linkage disequilibrium using R2), dprfast (estimate linkage disequilibrium using D'), rhofast (estimate linkage disequilibrium using 'rho')

For specific tools facilitating analysis of the data with stratification (population stratification or (possibly unknown) pedigree structure), see qtscore (implements basic Genomic Control), ibs (computations of IBS / genomic IBD), egscore (stratification adjustment following Price et al.), polygenic (heritability analysis), polygenic_hglm (another function for heritability analysis), mmscore (score test of Chen and Abecasis), grammar (grammar, grammar-gc, and garmmar-gamma tests of Aulchenko et al., Amin et al., and Svishcheva et al.).

For functions facilitating construction of tables for your manuscript, see descriptives.marker, descriptives.trait, descriptives.scan.

For functions recunstructing relationships from genomic data, see findRelatives, reconstructNPs.

For meta-analysis and related, see help on formetascore.

For link to WEB databases, see show.ncbi.

For interfaces to other packages and standard R functions, also for 2D scans, see scan.glm, scan.glm.2D, scan.haplo, scan.haplo.2D, scan.gwaa-class, scan.gwaa.2D-class.

Yurii Aulchenko et al. (see help pages for specific functions)


