plink_scoring: PLINK Allelic Scoring

Description Usage Arguments Details Value

View source: R/plink-scoring.R

Description

Interface to PLINK to perform allelic scoring. This function will save the PLINK outputs to the current working directory, and read plink.profile into R for immediate use as a tibble.

Usage

1
2
plink_scoring(bfile, scores, sum = TRUE, header = TRUE,
  cols = "1 2 3", range_file = "", data_file = "", ...)

Arguments

bfile

Path to PLINK files (bed, bim, fam); prefix only.

scores

Path to scoring file; see details.

sum

Logical. Should sum of allele scores be returned? Average allele scores returned if set to FALSE.

header

Logical. Should PLINK ignore the first non-empty line (assumed to be a header) in scores and data_file (if specified)?

cols

A string specifying the column positions for the variant names, allele codes, and scores. By default, PLINK assumes the column positions are 1, 2, and 3 in scores, respectively.

range_file

Path to file containing range labels in the first column (i.e. named ID for a given range), lower bounds in the second column, and upper bounds in the third column. See details.

data_file

Path to file containing variant IDs (column 1) and the key quantity (column 2) on each non-empty line. See details.

...

Additional arguments passed to read_delim.

Details

The scores file is white-space delimited, and should contain a row for every variant included in the scoring algorithm, and should have a minimum of 3 columns:

  1. Variant name (usually rsID)

  2. Allele code (the allele that the score is in reference too)

  3. Score (score associated with the named allele)

In other words, for a given variant (column 1), the score (column 3) may represent the per-allele increase in the log-odds of the phenotype for every additional allele named in column 2 (i.e. additive genetic model).

PLINK also offers the ability to apply allelic scoring to a subset(s) of the variants in scores based on the range of some key quantity (e.g. P-value). To do this, you must additionally provide:

For example, a range of [0, 0.00000005] would perform allelic scoring for all variants in scores and data_file that have a GWAS-significant P-value.

Value

A tibble with six columns:

If allelic scoring is done for multiple subsets of the variants, defined by range_file and data_file, then each plink.*.profile file is read, row-bound, and nested by file-name, to form a single nested tibble. The nested-tibble contains a column for the file name and a list-column of tibbles for the nested scoring data. Use unnest to unnest the data back into a single tibble.

The returned tibble has the added attribute log which contains a tibble for the log-file from PLINK. This can be accessed using attr(x, 'log'), where x is the name of your object.

The log attribute contains a one-column tibble with a row for every line-break in the log-file. This allows relatively easy access to the log while staying in R, and one can utilize stringr functions to query the log messages.


mattwarkentin/genetools documentation built on Nov. 4, 2019, 6:19 p.m.