plink_scoring: PLINK Allelic Scoring

Description Usage Arguments Details Value

View source: R/plink-scoring.R


Interface to PLINK to perform allelic scoring. This function will save the PLINK outputs to the current working directory, and read plink.profile into R for immediate use as a tibble.


plink_scoring(bfile, scores, sum = TRUE, header = TRUE,
  cols = "1 2 3", range_file = "", data_file = "", ...)



Path to PLINK files (bed, bim, fam); prefix only.


Path to scoring file; see details.


Logical. Should sum of allele scores be returned? Average allele scores returned if set to FALSE.


Logical. Should PLINK ignore the first non-empty line (assumed to be a header) in scores and data_file (if specified)?


A string specifying the column positions for the variant names, allele codes, and scores. By default, PLINK assumes the column positions are 1, 2, and 3 in scores, respectively.


Path to file containing range labels in the first column (i.e. named ID for a given range), lower bounds in the second column, and upper bounds in the third column. See details.


Path to file containing variant IDs (column 1) and the key quantity (column 2) on each non-empty line. See details.


Additional arguments passed to read_delim.


The scores file is white-space delimited, and should contain a row for every variant included in the scoring algorithm, and should have a minimum of 3 columns:

  1. Variant name (usually rsID)

  2. Allele code (the allele that the score is in reference too)

  3. Score (score associated with the named allele)

In other words, for a given variant (column 1), the score (column 3) may represent the per-allele increase in the log-odds of the phenotype for every additional allele named in column 2 (i.e. additive genetic model).

PLINK also offers the ability to apply allelic scoring to a subset(s) of the variants in scores based on the range of some key quantity (e.g. P-value). To do this, you must additionally provide:

For example, a range of [0, 0.00000005] would perform allelic scoring for all variants in scores and data_file that have a GWAS-significant P-value.


A tibble with six columns:

If allelic scoring is done for multiple subsets of the variants, defined by range_file and data_file, then each plink.*.profile file is read, row-bound, and nested by file-name, to form a single nested tibble. The nested-tibble contains a column for the file name and a list-column of tibbles for the nested scoring data. Use unnest to unnest the data back into a single tibble.

The returned tibble has the added attribute log which contains a tibble for the log-file from PLINK. This can be accessed using attr(x, 'log'), where x is the name of your object.

The log attribute contains a one-column tibble with a row for every line-break in the log-file. This allows relatively easy access to the log while staying in R, and one can utilize stringr functions to query the log messages.

mattwarkentin/genetools documentation built on Nov. 4, 2019, 6:19 p.m.