read_grm: Read GCTA GRM and related plink2 binary files

View source: R/read_grm.R

read_grmR Documentation

Read GCTA GRM and related plink2 binary files

Description

This function reads a GCTA Genetic Relatedness Matrix (GRM, i.e. kinship) set of files in their binary format, returning the kinship matrix and, if available, the corresponding matrix of pair sample sizes (non-trivial under missingness) and individuals table. Setting some options allows reading plink2 binary kinship formats such as "king" (see examples).

Usage

read_grm(
  name,
  n_ind = NA,
  verbose = TRUE,
  ext = "grm",
  shape = c("triangle", "strict_triangle", "square"),
  size_bytes = 4,
  comment = "#"
)

Arguments

name

The base name of the input files. Files with that base, plus shared extension (default "grm", see ext below), plus extensions .bin, .N.bin, and .id are read if they exist. Only .<ext>.bin is absolutely required; .<ext>.id can be substituted by the number of individuals (see below); .<ext>.N.bin is entirely optional.

n_ind

The number of individuals, required if the file with the extension .<ext>.id is missing. If the file with the .<ext>.id extension is present, then this n_ind is ignored.

verbose

If TRUE (default), function reports the path of the files being loaded.

ext

Shared extension for all three inputs (see name above; default "grm"). Another useful value is "king" for KING-robust estimates produced by plink2. If NA, no extension is added. If given ext is also present at the end of name, then it is not added again.

shape

The shape of the information to read (may be abbreviated). Default "triangle" assumes there are n*(n+1)/2 values to read corresponding to the upper triangle including the diagonal (required for GCTA GRM). "strict_triangle" assumes there are n*(n-1)/2 values to read corresponding to the upper triangle excluding the diagonal (best for plink2 KING-robust). Lastly, "square" assumes there are n*n values to read corresponding to the entire square matrix, ignoring symmetry.

size_bytes

The number of bytes per number encoded. Default 4 corresponds to GCTA GRM and plink2 "bin4", whereas plink2 "bin" requires a value of 8.

comment

Character to start comments in <ext>.id file only. Default "#" helps plink2 .id files (which have a header that starts with "#", which is therefore ignored) be read just like plink1 and GCTA files (which do not have a header).

Value

A list with named elements:

  • kinship: The symmetric n-times-n kinship matrix (GRM). Has IDs as row and column names if the file with extension .<ext>.id exists. If shape='strict_triangle', diagonal will have missing values.

  • M: The symmetric n-times-n matrix of pair sample sizes (number of non-missing loci pairs), if the file with extension .<ext>.N.bin exists. Has IDs as row and column names if the file with extension .<ext>.id was available. If shape='strict_triangle', diagonal will have missing values.

  • fam: A tibble with two columns: fam and id, same as in Plink FAM files. Returned if the file with extension .<ext>.id exists.

See Also

write_grm()

Greatly adapted from sample code from GCTA: https://cnsgenomics.com/software/gcta/#MakingaGRM

Examples

# to read "data.grm.bin" and etc, run like this:
# obj <- read_grm("data")
# obj$kinship # the kinship matrix
# obj$M       # the pair sample sizes matrix
# obj$fam     # the fam and ID tibble

# The following example is more awkward
# because package sample data has to be specified in this weird way:

# read an existing set of GRM files
file <- system.file("extdata", 'sample.grm.bin', package = "genio", mustWork = TRUE)
file <- sub('\\.grm\\.bin$', '', file) # remove extension from this path on purpose
obj <- read_grm(file)
obj$kinship # the kinship matrix
obj$M       # the pair sample sizes matrix
obj$fam     # the fam and ID tibble

# Read sample plink2 KING-robust files (several variants).
# Read both base.king.bin and base.king.id files.
# All generated with "plink2 <input> --make-king <options> --out base"
# (replace "base" with actual base name) with these options:
# #1) "triangle bin"
# data <- read_grm( 'base', ext = 'king', shape = 'strict', size_bytes = 8 )
# #2) "triangle bin4"
# data <- read_grm( 'base', ext = 'king', shape = 'strict' )
# #3) "square bin"
# data <- read_grm( 'base', ext = 'king', shape = 'square', size_bytes = 8 )
# #4) "square bin4"
# data <- read_grm( 'base', ext = 'king', shape = 'square' )


genio documentation built on Jan. 7, 2023, 1:12 a.m.