plink_scoring: PLINK Allelic Scoring
In mattwarkentin/genetools: What the Package Does (One Line, Title Case)

Description Usage Arguments Details Value

View source: R/plink-scoring.R

Interface to PLINK to perform allelic scoring. This function will save the PLINK outputs to the current working directory, and read plink.profile into R for immediate use as a tibble.

1 2	plink_scoring(bfile, scores, sum = TRUE, header = TRUE, cols = "1 2 3", range_file = "", data_file = "", ...)

`bfile`	Path to `PLINK` files (bed, bim, fam); prefix only.
`scores`	Path to scoring file; see details.
`sum`	Logical. Should sum of allele scores be returned? Average allele scores returned if set to `FALSE`.
`header`	Logical. Should `PLINK` ignore the first non-empty line (assumed to be a header) in `scores` and `data_file` (if specified)?
`cols`	A string specifying the column positions for the variant names, allele codes, and scores. By default, `PLINK` assumes the column positions are 1, 2, and 3 in `scores`, respectively.
`range_file`	Path to file containing range labels in the first column (i.e. named ID for a given range), lower bounds in the second column, and upper bounds in the third column. See details.
`data_file`	Path to file containing variant IDs (column 1) and the key quantity (column 2) on each non-empty line. See details.
`...`	Additional arguments passed to `read_delim`.

The scores file is white-space delimited, and should contain a row for every variant included in the scoring algorithm, and should have a minimum of 3 columns:

Variant name (usually rsID)
Allele code (the allele that the score is in reference too)
Score (score associated with the named allele)

In other words, for a given variant (column 1), the score (column 3) may represent the per-allele increase in the log-odds of the phenotype for every additional allele named in column 2 (i.e. additive genetic model).

PLINK also offers the ability to apply allelic scoring to a subset(s) of the variants in scores based on the range of some key quantity (e.g. P-value). To do this, you must additionally provide:

range_file A white-space delimited text file with three columns (header optional; PLINK ignores non-numeric values in columns 2 and 3, by default). The first column is a unique ID for the range specified in columns 2 (lower bound) and 3 (upper bound), inclusive. This ID is used in producing the named output from PLINK (e.g. plink.ID.profile)
data_file A white-space delimited text file with two columns (header will be ignored or not based on the header = TRUE/FALSE argument; so presence of a header must be consistent with the scores file). Column 1 is the variant ID and column 2 is the key quantity that the range will be applied to (e.g. P-value)

For example, a range of [0, 0.00000005] would perform allelic scoring for all variants in scores and data_file that have a GWAS-significant P-value.

A tibble with six columns:

FID Family ID (from PLINK .fam file)
IID Individual ID (from PLINK .fam file)
PHENO Phenotype (from PLINK .fam file)
CNT Number of non-missing alleles used for scoring
CNT2 Sum of named allele counts
SCORE/SCORESUM Either the average of all allele scores, or the sum of the allele scores, depending on the sum argument

If allelic scoring is done for multiple subsets of the variants, defined by range_file and data_file, then each plink.*.profile file is read, row-bound, and nested by file-name, to form a single nested tibble. The nested-tibble contains a column for the file name and a list-column of tibbles for the nested scoring data. Use unnest to unnest the data back into a single tibble.

The returned tibble has the added attribute log which contains a tibble for the log-file from PLINK. This can be accessed using attr(x, 'log'), where x is the name of your object.

The log attribute contains a one-column tibble with a row for every line-break in the log-file. This allows relatively easy access to the log while staying in R, and one can utilize stringr functions to query the log messages.

mattwarkentin/genetools documentation built on Nov. 4, 2019, 6:19 p.m.