Description Usage Arguments Details Value
View source: R/plink-scoring.R
Interface to PLINK to perform allelic scoring. This function will save the PLINK
outputs to the current working directory, and read plink.profile into R
for immediate use as a tibble
.
1 2 |
bfile |
Path to |
scores |
Path to scoring file; see details. |
sum |
Logical. Should sum of allele scores be returned? Average allele scores returned if set to |
header |
Logical. Should |
cols |
A string specifying the column positions for the variant names, allele codes, and scores. By default, |
range_file |
Path to file containing range labels in the first column (i.e. named ID for a given range), lower bounds in the second column, and upper bounds in the third column. See details. |
data_file |
Path to file containing variant IDs (column 1) and the key quantity (column 2) on each non-empty line. See details. |
... |
Additional arguments passed to |
The scores
file is white-space delimited, and should contain a row for every variant included in the scoring algorithm, and should have a minimum of 3 columns:
Variant name (usually rsID)
Allele code (the allele that the score is in reference too)
Score (score associated with the named allele)
In other words, for a given variant (column 1), the score (column 3) may represent the per-allele increase in the log-odds of the phenotype for every additional allele named in column 2 (i.e. additive genetic model).
PLINK
also offers the ability to apply allelic scoring to a subset(s) of the variants in scores
based on the range of some key quantity (e.g. P-value). To do this, you must additionally provide:
range_file
A white-space delimited text file with three columns (header optional; PLINK
ignores non-numeric values in columns 2 and 3, by default). The first column is a unique ID for the range specified in columns 2 (lower bound) and 3 (upper bound), inclusive. This ID is used in producing the named output from PLINK
(e.g. plink.ID.profile)
data_file
A white-space delimited text file with two columns (header will be ignored or not based on the header = TRUE/FALSE
argument; so presence of a header must be consistent with the scores
file). Column 1 is the variant ID and column 2 is the key quantity that the range will be applied to (e.g. P-value)
For example, a range of [0, 0.00000005] would perform allelic scoring for all variants in scores
and data_file
that have a GWAS-significant P-value.
A tibble
with six columns:
FID
Family ID (from PLINK
.fam file)
IID
Individual ID (from PLINK
.fam file)
PHENO
Phenotype (from PLINK
.fam file)
CNT
Number of non-missing alleles used for scoring
CNT2
Sum of named allele counts
SCORE/SCORESUM
Either the average of all allele scores, or the sum of the allele scores, depending on the sum
argument
If allelic scoring is done for multiple subsets of the variants, defined by range_file
and data_file
, then each plink.*.profile file is read, row-bound, and nested by file-name, to form a single nested tibble
. The nested-tibble
contains a column for the file name and a list-column of tibbles
for the nested scoring data. Use unnest
to unnest the data back into a single tibble
.
The returned tibble
has the added attribute log
which contains a tibble
for the log-file from PLINK
. This can be accessed using attr(x, 'log')
, where x
is the name of your object.
The log
attribute contains a one-column tibble
with a row for every line-break in the log-file. This allows relatively easy access to the log while staying in R
, and one can utilize stringr
functions to query the log messages.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.