sample: Sample datasets to illustrate data input

Description Format Source References


The first five files concern data on 20 diallelic loci on 120 subjects. These data are distributed with the Haploview package (Barrett et al., 2003). The sixth files contains a additional dataset of 18 SNPs in 100 subjects, coded in "long" format, and the seventh file duplicates this dataset in an alternative long format. These seven files are used in the data input vignette. The final file is a sample imputed genotype dataset distributed with the MACH imputation package, and used in the imputation vignette.

These files are stored in the extdata relative to the package base. Full file names can be obtained using the system.file function.


The following files are described here:

  • sample.ped.gz: A gzipped pedfile

  • An accompanying locus information file

  • sample.bed: The corresponding PLINK .bed file

  • sample.bim: The PLINK .bim file

  • sample.fam: The PLINK .fam file

  • sample-long.gz: A sample of long-formatted data

  • sample-long-alleles.gz: The same as above, but allele-coded

  • mach1.out.mlprob.gz: An mlprob output file from the MACH genotype imputation program. This file contains, for each imputed genotype call, posterior probabilities for the three possible genotypes



Barrett JC, Fry B, Maller J, Daly MJ.(2005) Haploview: analysis and visualization of LD and haplotype maps. Bioinformatics, 2005 Jan 15, [PubMed ID: 15297300]

Search within the snpStats package
Search all R packages, documentation and source code

Questions? Problems? Suggestions? or email at

Please suggest features or report bugs with the GitHub issue tracker.

All documentation is copyright its authors; we didn't write any of that.