write_plink: Write genotype and sample data into a Plink BED/BIM/FAM file...

View source: R/write_plink.R

write_plinkR Documentation

Write genotype and sample data into a Plink BED/BIM/FAM file set.

Description

This function writes a genotype matrix (X) and its associated locus (bim) and individual (fam) data tables into three Plink files in BED, BIM, and FAM formats, respectively. This function is a wrapper around the more basic functions write_bed(), write_bim(), write_fam(), but additionally tests that the data dimensions agree (or stops with an error). Also checks that the genotype row and column names agree with the bim and fam tables if they are all present. In addition, if bim = NULL or fam = NULL, these are auto-generated using make_bim() and make_fam(), which is useful behavior for simulated data. Lastly, the phenotype can be provided as a separate argument and incorporated automatically if fam = NULL (a common scenario for simulated genotypes and traits). Below suppose there are m loci and n individuals.

Usage

write_plink(
  file,
  X,
  bim = NULL,
  fam = NULL,
  pheno = NULL,
  verbose = TRUE,
  append = FALSE,
  write_phen = FALSE
)

Arguments

file

Output file path, without extensions (each of .bed, .bim, .fam extensions will be added automatically as needed).

X

The m-by-n genotype matrix.

bim

The tibble or data.frame containing locus information. It must contain m rows and these columns: chr, id, posg, pos, ref, alt. If NULL (default), it will be quietly auto-generated.

fam

The tibble or data.frame containing individual information. It must contain n rows and these columns: fam, id, pat, mat, sex, pheno. If NULL (default), it will be quietly auto-generated.

pheno

The phenotype to write into the FAM file assuming fam = NULL. This must be a length-n vector. This will be ignored (with a warning) if fam is provided.

verbose

If TRUE (default) function reports the paths of the files being written (after autocompleting the extensions).

append

If TRUE, appends loci onto the BED and BIM files (default FALSE). In this mode, all individuals must be present in each write (only loci are appended); the FAM file is not overwritten if present, but is required at every write for internal validations. If the FAM file already exists, it is not checked to agree with the FAM table provided. PHEN file is always unchanged and ignored if append = TRUE.

write_phen

If TRUE and append = FALSE, writes a .phen file too from the fam data provided or auto-generated (using write_phen()). Default FALSE.

Value

Invisibly, a named list with items in this order: X (genotype matrix), bim (tibble), fam (tibble). This is most useful when either BIM or FAM tables were auto-generated.

See Also

write_bed(), write_bim(), write_fam(), make_bim(), make_fam().

Plink BED/BIM/FAM format reference: https://www.cog-genomics.org/plink/1.9/formats

Examples

# to write existing data `X`, `bim`, `fam` into files "data.bed", "data.bim", and "data.fam",
# run like this:
# write_plink("data", X, bim = bim, fam = fam)

# The following example is more detailed but also more awkward
# because (only for these examples) the package must create the file in a *temporary* location

# here is an example for a simulation

# create 10 random genotypes
X <- rbinom(10, 2, 0.5)
# replace 3 random genotypes with missing values
X[sample(10, 3)] <- NA
# turn into 5x2 matrix
X <- matrix(X, nrow = 5, ncol = 2)

# simulate a trait for two individuals
pheno <- rnorm(2)

# write this data to BED/BIM/FAM files
# output path without extension
file_out <- tempfile('delete-me-example')
# here all of the BIM and FAM columns except `pheno` are autogenerated
write_plink(file_out, X, pheno = pheno)

# delete all three outputs when done
delete_files_plink( file_out )


genio documentation built on Jan. 7, 2023, 1:12 a.m.