align.snpdata.coding: Update genotype coding when there are coded allele...

Description Usage Arguments Details Value Author(s) Examples

View source: R/input.R

Description

The input parameterisation specify a desired coded allele for each SNP. This function examines the coded and noncoded alleles used in the input genotype data, and for each SNP where the the input genotype data are encoded as the dose of the opposite (desired noncoded allele) allele, an additional column is added to the output genotype data with the dose of the desired coded allele.

Usage

1
2
align.snpdata.coding(params, snpdata, ploidy = 2,
                     missing.snp = "fail")

Arguments

params

a data frame, see gtx.params.

snpdata

a list with snpinfo and data, see snpdata.

ploidy

if dosage for the noncoded allele is x, the dosage for the coded allele is calculated as ploidy-x.

missing.snp

character, either "fail" or "okay".

Details

The PLINK convention of calling the coded allele “0” for monomorphic SNPs is handled transparently, by assuming that the absent allele in the input genotype data matches whatever allele in the desired parameterisation does not match the present allele in the input genotype data. This behaviour should not cause inadvertent strand flips.

You should not need to call this function, unless you are intending to call grs.onesnp.apply without calling grs.make.scores first. Note that grs.onesnp.apply has no way to check whether columns for desired coded alleles are present and may return NA for codes it cannot find.

The ploidy argument defaults to 2, but should be set to 1 if the input genotype data are haplotypes (either phased or male X or Y chromosome).

The missing.snp argument controls how to handle SNPs in the desired paramterisation that are not present in the input genotype data. If "okay" then SNPs listed in the desired parameterisation but not present in the input genotype data are assumed to have dosage zero for all individuals.

This function is one of the main computational bottlenecks and should be aggresively optimised in future releases.

Value

List with $params and $snpdata slots, contain the input arguments with additional columns. The input params has an extra column data.coded.freq and the input snpdata$data has extra column(s) for doses of the specified coded alleles.

Author(s)

Toby Johnson Toby.x.Johnson@gsk.com

Examples

1
2
3
4
data(mthfrex)
"rs1537514_G" %in% names(mthfrex$data) # FALSE
mthfrex <- align.snpdata.coding(mthfr.params, mthfrex)$snpdata  
"rs1537514_G" %in% names(mthfrex$data) # TRUE

tobyjohnson/gtx documentation built on Aug. 30, 2019, 8:07 p.m.