snpRecode: Recode a matrix of SNP genotypes as 0, 1, and 2
In gdmp: Genomic Data Management

Description Usage Arguments Details Value See Also Examples

View source: R/dataEdit.R

snpRecode is a function to convert SNP genotypes to 0, 1, and 2 for the homozygous, heterozygous, and other homozygous genotype, respectively.

1	snpRecode(snpG, designat)

`snpG`	is a column vector in the genotypes array, created by `toArray`. The column represents genotypes of a single SNP for all or a subset of individuals in data.
`designat`	is the 2-base allele designations for each SNP. This is sometimes called allele report data, where the specefic bases of alleles A and B are reported. Formated as data frame with two factors for alleles A and B. See ‘Examples’.

Recode snp genotypes by counting the number of copies of allele A in an element of snpG which is a column vector in the genotypes array, ga, where

snpG is a column vector in the genotypes array,
ga is the genotypes array created by toArray. It contains elements such as "AA", "AG", "GA", "-A", "- -".

Unknown genotypes are those with non A/G/C/T bases, those are coded as 5.

A column vector of the integers 0, 1, and 2 is created based on the number of copies of allele A in each element of the supplied vector of genotypes. A value of 5 is used to indicate an unknown genotype.

toArray

## Simulate random allele designations for 100 bi-allelic SNPs
set.seed(2016)
desig <- array(sample(c('A','C','G','T'), size = 200, repl = TRUE), dim=c(100, 2))

## Simulate random SNP genotypes for 20 individuals - put them in array format
## '-' indicates an unknown base
ga <- array(0, dim=c(20, 100))
for(i in 1:20)
  for(j in 1:100)
    ga[i, j] <- paste(sample(c(desig[j,],"-"), 2, prob=c(.46, .46, .08), repl=TRUE), collapse='')

## Recode the matrix, place recoded genotypes in ga.r
desig <- data.frame(AlleleA_Forward = factor(desig[,1]), AlleleB_Forward = factor(desig[,2]))
ga.r <- array(5, dim=c(20, 100))
for(i in 1:100) ga.r[,i] <- snpRecode(ga[,i], desig[i,])

## Tabulate recoded genotypes in the matrix ga.r
table(ga.r)
#   0   1   2   5
# 326 632 701 341