read.plink: Read a PLINK binary data file as a SnpMatrix

Description Usage Arguments Details Value Note Author(s) References See Also

View source: R/plink.R


The package PLINK saves genome-wide association data in groups of three files, with the extensions .bed, .bim, and .fam. This function reads these files and creates an object of class "SnpMatrix"


read.plink(bed, bim, fam, na.strings = c("0", "-9"), sep = "." , select.subjects = NULL, select.snps = NULL) 



The name of the file containing the packed binary SNP genotype data. It should have the extension .bed; if it doesn't, then this extension will be appended


The file containing the SNP descriptions


The file containing subject (and, possibly, family) identifiers. This is basically a tab-delimited "pedfile"


Strings in .bam and .fam files to be recoded as NA


A separator character for constructing unique subject identifiers


A numeric vector indicating a subset of subjects to be selected from the input file (see details)


Either a numeric or a character vector indicating a subset of SNPs to be selected from the input file (see details)


If the bed argument does not contain a filename with the file extension .bed, then this extension is appended to the argument. The remaining two arguments are optional; their default values are obtained by replacing the .bed filename extension by .bim and .fam respectively. See the PLINK documentation for the detailed specification of these files.

The select.subjects or select.snps argument can be used to read a subset of the data. Use of select.snps requires that the .bed file is in SNP-major order (the default in PLINK). Likewise, use of select.snps requires that the .bed file is in individual-major order. Subjects are selected by their numeric order in the PLINK files, while SNPs are selected either by order or by name. Note that the order of selected SNPs/subjects in the output objects will be the same as their order in the PLINK files.

Row names for the output SnpMatrix object and for the accompanying subject description dataframe are taken as the pedigree identifiers, when these provide the required unique identifiers. When these are duplicated, an attempt is made to use the pedigree-member identifiers instead but, when these too are duplicated, row names are obtained by concatenating, with a separator character, the pedigree and pedigree-member identifiers.


A list with three elements:


The output genotype data as an object of class



A dataframe corresponding to the .fam file, containing the first six fields in a standard pedfile. The row names will correspond with those of the SnpMatrix


A dataframe correponding to the .bim file. the row names correpond with the column names of the SnpMatrix


No special provision is made to read XSnpMatrix objects; such data should first be read as a SnpMatrix and then coerced to an XSnpMatrix using new or as.


David Clayton


PLINK: Whole genome association analysis toolset.

See Also

write.plink, SnpMatrix-class, XSnpMatrix-class

snpStats documentation built on Nov. 8, 2020, 10:59 p.m.