snpQC: SNP Quality Control

Description Usage Arguments Value Author(s) Examples

View source: R/snp.R

Description

Functions for quality control. 'snpQC' may be used to count/remove neighbor repeated SNPs, markers with MAF lower than a given threshold, and imputations. 'cleanREP' identifies and merge duplicate genotypes. The 'reference' function changes the reference genotype. For NAM populations, this function must be used when genotypes are coded according to the reference genome instead of the standard parent.

Usage

1
2
3
snpQC(gen,psy=1,MAF=0.05,misThr=0.8,remove=TRUE,impute=FALSE)
cleanREP(y,gen,fam=NULL,thr=0.95)
reference(gen,ref=NULL)

Arguments

gen

Numeric matrix containing the genotypic data. A matrix with n rows of observations and (m) columns of molecular markers. SNPs must be coded as 0, 1, 2, for founder homozigous, heterozigous and reference homozigous. NA is allowed.

psy

Tolerance parameter for markers in Perfect SYymmetry (psy). This QC remove identical markers (aka. full LD) that carry the same information. Default is 1, which removes only SNPs 100% equal to its following neighbor.

MAF

Minor Allele Frequency. Default is 0.05. Useful to inform or remove markers below the MAF threshold. Markers with standard deviation below the MAF threshold will be also removed.

misThr

Missing value threshold. Default is 0.8, removing markers with more than 80 percent missing values.

remove

Logical. Remove SNPs due to PSY or MAF.

impute

If TRUE, impute missing values using the expected value.

y

Numeric vector (n) or numeric matrix (n x t) of observations describing the trait to be analyzed. NA is allowed.

fam

Numeric vector of length (n) indicating which subpopulations (i.e. family) each observation comes from. Default assumes that all observations are from the same populations.

thr

Threshold above which genotypes are considered identical. Default is 0.95, merging genotypes >95 percent identical.

ref

Numeric vector of length n with elements coded as 0, 1, 2, it represents the genotypic information of a new reference genotype. Default assumes that more frequent allele represents the reference genome.

Value

snpQC - Returns the genomic matrix without missing values, redundancy or low MAF markers.

cleanREP - List containing the inputs without replicates. Groups of replicates are replaced by a single observation with the phenotypic expected value. The algorithm keeps the genotypic information of the first individual (genotypic matrix order).

reference - Returns a recoded gen matrix

Author(s)

Alencar Xavier, Katy Rainey and William Muir

Examples

1
2
3
4
5
6
7
  ## Not run: 
data(tpod)
gen=reference(gen)
gen=snpQC(gen=gen,psy=1,MAF=0.05,remove=TRUE,impute=FALSE)
test=cleanREP(y,gen)
  
## End(Not run)

alenxav/NAM documentation built on Jan. 8, 2020, 9:21 p.m.