Check SNP genotypes and exclude redundant SNPs
Given individual genotypes of a set of SNPs, the function checks for the existence of redundant SNPs to exclude. A margin of error of 0.5% is allowed by default.
Matrix of genotypes created by
Numeric indices or column names of
allowed margin of error, default is 0.005.
Test for similar SNP genotypes across a set of individuals. SNPs are considered
identical if the number of different genotypes in the population tested remains below
an allowed error margin of 0.5%. Say,
Exclude <- 1:100 with SNP #1 similar to #25,
Exclude will be flagged for exclusion, whereas
will not be flagged for exclusion.
In addition to identical SNPs, the function flaggs SNP genotypes that are entirely opposite within error margin as redundant as well. Thus, SNPs are declared highly correlated if the genotypes are all the same (0-0, 1-1, and 2-2) or all opposite (0-2, 1-1, 2-0) within the error margin specified.
Exclude contains SNP names, a character
vector of excluded SNPs is returned, and if it contains integer values, a numeric vector
of excluded SNPs is returned.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
## Simulate random allele designations for 100 bi-allelic SNPs set.seed(2016) desig <- array(sample(c('A','C','G','T'), size = 200, repl = TRUE), dim=c(100, 2)) ## Simulate random SNP genotypes for 20 individuals - put them in array format ## '-' indicates an unknown base ga <- array(0, dim=c(20, 100)) for(i in 1:20) for(j in 1:100) ga[i, j] <- paste(sample(c(desig[j,],"-"), 2, prob=c(.47, .47, .06), repl=TRUE), collapse='') ## Recode the matrix, place recoded genotypes in ga.r desig <- data.frame(AlleleA_Forward = factor(desig[,1]), AlleleB_Forward = factor(desig[,2])) ga.r <- array(5, dim=c(20, 100)) for(i in 1:100) ga.r[,i] <- snpRecode(ga[,i], desig[i,]) ## Check all SNP genotypes in ga.r for similarity across individuals ## Allow for a margin of error of 0.5% GetHCS(ga.r) # 42 91 # SNPs 42 & 91 are similar to earlier SNPs in the vector, 'Exclude' ## Check SNP genotypes from 1 to 50 for similarity across individuals GetHCS(ga.r, Exclude=1:50) # 42