snpmclust: Genotype clustering and calling

Description Usage Arguments Value Author(s) References Examples

Description

Genotype clustering and calling for Illumina microarrays.

Usage

1
2
3
snpmclust(indata, p = 1, priorfrac = 0.2, uncertcutoff = 0.01, qcutoff = 0,
          showplots = FALSE, xm1 = NA, xm2 = NA, xm3 = NA, ym1 = NA,
          ym2 = NA, ym3 = NA, ranseed = 1969, R.lowcutoff = 0.05)

Arguments

indata

A list containing input data on one or all SNPs, and would normally be produced by the function prepdata. Details on the different components of indata can be seen in help(prepdata).

p

A positive integer specifying which SNP to cluster. The default is 1.

priorfrac

A non-negative scalar specifying the number of observations, as a fraction of the number of samples N, of pseudodata to be appended to the heterozygous and homozygous minor genotypes. The default is 0.2.

uncertcutoff

Genotype calls with uncertainty greater than uncertcutoff are set to "NC" (no call). The default is 0.01.

qcutoff

Uncertainty scores lower than the qcutoff'th quantile are reset to that value. When used with R.lowcutoff, this is equivalent to requiring a SNP-specific call rate of qcutoff or higher.

showplots

A logical value. If TRUE, the function will produce a series of plots. The default is FALSE.

xm1, xm2, xm3, ym1, ym2, ym3

Pseudodata cluster means can be user-specified through these parameters. The ordered pair (xm1,ym1) gives the cluster mean for genotype AA; similarly for (xm2,ym2), (xm3,ym3) and AB, BB, respectively. Default values are NA, in which case cluster means are estimated from the data, conditional on the a priori genotypes produced by GenomeStudio.

ranseed

Random seed for generation of pseudodata. The default is 1969.

R.lowcutoff

Genotypes for which R is less than R.lowcutoff are set to "NC" (no call). The default is 0.05.

Value

A list with the following components:

calls

A data frame with N rows and 4 columns, namely, SNP, SampleID, MClustCalls (the genotype call), and Uncertainty.

snp

The SNP name (i.e. rs-number).

callrate

Call rate for the SNP.

priorfrac

Value of argument in function call.

uncertcutoff

Value of argument in function call.

qcutoff

Value of argument in function call.

Author(s)

Stephen W. Erickson serickson@rti.org with Joshua C. Callaway joshcllw@gmail.com

References

Stephen W. Erickson, Joshua Callaway (2016). SNPMClust: Bivariate Gaussian Genotype Clustering and Calling for Illumina Microarrays. Journal of Statistical Software, 71(2), 1-9. doi:10.18637/jss.v071.c02

Examples

1
2
3
data(testset)
tmpfile = prepdata(testset)
snpmclust(tmpfile, p=1, showplots=TRUE)

superRhero4/SNPMClust documentation built on May 30, 2019, 8:40 p.m.