read_bed | R Documentation |
This function reads genotypes encoded in a Plink-formatted BED (binary) file, returning them in a standard R matrix containing genotypes encoded numerically as dosages (values in c( 0, 1, 2, NA )
).
Each genotype per locus (m
loci) and individual (n
total) counts the number of reference alleles, or NA
for missing data.
No *.fam or *.bim files are read by this basic function.
Since BED does not encode the data dimensions internally, these values must be provided by the user.
read_bed( file, names_loci = NULL, names_ind = NULL, m_loci = NA, n_ind = NA, ext = "bed", verbose = TRUE )
file |
Input file path.
*.bed extension may be omitted (will be added automatically if |
names_loci |
Vector of loci names, to become the row names of the genotype matrix.
If provided, its length sets |
names_ind |
Vector of individual names, to become the column names of the genotype matrix.
If provided, its length sets |
m_loci |
Number of loci in the input genotype table.
Required if |
n_ind |
Number of individuals in the input genotype table.
Required if |
ext |
The desired file extension (default "bed").
Ignored if |
verbose |
If |
The code enforces several checks to validate data given the requested dimensions. Errors are thrown if file terminates too early or does not terminate after genotype matrix is filled. In addition, as each locus is encoded in an integer number of bytes, and each byte contains up to four individuals, bytes with fewer than four are padded. To agree with other software (plink2, BEDMatrix), byte padding values are ignored (may take on any value without causing errors).
This function only supports locus-major BED files, which are the standard for modern data. Format is validated via the BED file's magic numbers (first three bytes of file). Older BED files can be converted using Plink.
The m
-by-n
genotype matrix.
read_plink()
for reading a set of BED/BIM/FAM files.
geno_to_char()
for translating numerical genotypes into more human-readable character encodings.
Plink BED format reference: https://www.cog-genomics.org/plink/1.9/formats#bed
# first obtain data dimensions from BIM and FAM files # all file paths file_bed <- system.file("extdata", 'sample.bed', package = "genio", mustWork = TRUE) file_bim <- system.file("extdata", 'sample.bim', package = "genio", mustWork = TRUE) file_fam <- system.file("extdata", 'sample.fam', package = "genio", mustWork = TRUE) # read annotation tables bim <- read_bim(file_bim) fam <- read_fam(file_fam) # read an existing Plink *.bim file # pass locus and individual IDs as vectors, setting data dimensions too X <- read_bed(file_bed, bim$id, fam$id) X # can specify without extension file_bed <- sub('\\.bed$', '', file_bed) # remove extension from this path on purpose file_bed # verify .bed is missing X <- read_bed(file_bed, bim$id, fam$id) # loads too! X
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.