write_grm: Write GCTA GRM and related plink2 binary files

View source: R/write_grm.R

write_grmR Documentation

Write GCTA GRM and related plink2 binary files

Description

This function writes a GCTA Genetic Relatedness Matrix (GRM, i.e. kinship) set of files in their binary format, given a kinship matrix and, if available, the corresponding matrix of pair sample sizes (non-trivial under missingness) and individuals table. Setting some options allows writing plink2 binary kinship formats such as "king" (follow examples in read_grm()).

Usage

write_grm(
  name,
  kinship,
  M = NULL,
  fam = NULL,
  verbose = TRUE,
  ext = "grm",
  shape = c("triangle", "strict_triangle", "square"),
  size_bytes = 4
)

Arguments

name

The base name of the output files. Files with that base, plus shared extension (default "grm", see ext below), plus extensions .bin, .N.bin, and .id may be created depending on the data provided.

kinship

The symmetric n-times-n kinship matrix to write into file with extension .<ext>.bin.

M

The optional symmetric n-times-n matrix of pair sample sizes to write into file with extension .<ext>.N.bin.

fam

The optional data.frame or tibble with individual annotations (columns with names fam and id, subset of columns of Plink FAM) to write into file with extension .<ext>.id. If fam is NULL but kinship has non-NULL column or row names, these are used as the second (id) value in the output table (the first (fam) column is set to the missing value in this case).

verbose

If TRUE (default), function reports the path of the files being written.

ext

Shared extension for all three outputs (see name above; default "grm"). Another useful value is "king", to match the KING-robust format produced by plink2. If NA, no extension is added. If given ext is also present at the end of name, then it is not added again.

shape

The shape of the information to write (may be abbreviated). Default "triangle" assumes there are n*(n+1)/2 values to write corresponding to the upper triangle including the diagonal (required for GCTA GRM). "strict_triangle" assumes there are n*(n-1)/2 values to write corresponding to the upper triangle excluding the diagonal (best for plink2 KING-robust). Lastly, "square" assumes there are n*n values to write corresponding to the entire square matrix, ignoring symmetry.

size_bytes

The number of bytes per number encoded. Default 4 corresponds to GCTA GRM and plink2 "bin4", whereas plink2 "bin" requires a value of 8.

See Also

read_grm()

Examples

# to write existing data `kinship`, `M`, and `fam` into files "data.grm.bin" etc, run like this:
# write_grm("data", kinship, M = M, fam = fam )

# The following example is more detailed but also more awkward
# because (only for these examples) the package must create the file in a *temporary* location

# create dummy data to write
# kinship for 3 individuals
kinship <- matrix(
    c(
        0.6, 0.2, 0.0,
        0.2, 0.5, 0.1,
        0.0, 0.1, 0.5
    ),
    nrow = 3
)
# pair sample sizes matrix
M <- matrix(
    c(
        10, 9, 8,
         9, 9, 7,
         8, 7, 8
    ),
    nrow = 3
)
# individual annotations table
library(tibble)
fam <- tibble(
    fam = 1:3,
    id = 1:3
)
# dummy files to write and delete
name <- tempfile('delete-me-example') # no extension
# write the data now!
write_grm( name, kinship, M = M, fam = fam )
# delete outputs when done
delete_files_grm( name )


genio documentation built on Jan. 7, 2023, 1:12 a.m.