knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
library(gRandma)
This will walk through an example of estimating error rates for assigning parent-offspring pairs and inferring parent-offspring pairs.
First, we make a gmaData object. For details, see the "Load_in_data_and_gmaData_structure" vignette.
# adding a simulated parent-offspring pair for demonstration set.seed(7) tempData <- data_mh_snp[data_mh_snp$Pop %in% c("Pop_2", "Pop_4"),] tempOffspGenos <- tempData[tempData$Pop == "Pop_2",][1,3:ncol(tempData)] for(i in seq(4, ncol(tempData), 2)) tempOffspGenos[1, i - 2] <- sample(tempData[!is.na(tempData[,i]),i], 1) tempOffspGenos[1, which(is.na(tempOffspGenos[1,])) + 1] <- NA tempOffspGenos <- cbind(data.frame(Pop = "Pop_4", Ind = "Ind00"), tempOffspGenos) tempData <- rbind(tempData, tempOffspGenos) # removing loci with no variation in the subset of pops we are going to use tRem <- c() for(i in seq(3, ncol(tempData) - 1, 2)) if(length(na.omit(unique(c(tempData[[i]], tempData[[i+1]])))) < 2) tRem <- c(tRem, i) tempData <- tempData[,-c(tRem, tRem + 1)] # creating a gmaData object gData_1 <- createGmaInput(baseline = tempData[tempData$Pop == "Pop_2",], mixture = tempData[tempData$Pop == "Pop_4",-1], perAlleleError = .005, dropoutProb = .005, markerType = "microhaps", alleleDistFunc = (function(x) return(1)))
We are first going to estimate false negative error rates. This is the probability that a true grandparent-grandchild trio fails to assign. Here we are testing critical values ($c$ in the manuscript) of $0, 2, ..., 24$.
falseNegative <- falseGrandma(gData_1, relationship = "sP", llrToTest = seq(0, 24, 2), N = 10000, seed = 7, errorType = "falseNegative") falseNegative[[1]][1:10,]
We now have an estimate of false negative error rates for each critical value for our one baseline population.
Let's first consider false positive error rates for unrelated trios. We can calculate these error rates by stratified sampling. We will run 1,000 iterations in each stratum. This is too few, but we will use it here because it is quick to build the vignette and because it will allow us to demonstrate the main drawback of the stratified sampling approach.
If you wanted, you could use a different number of iterations in each stratum to target computational effort. The
input, rep(1000, 10)
, tells it to run 1000 iterations in the first 10 strata (number of MIs from 0 - 9). We know from
the false negative output above that it is only going to consider 0 - 6 MIs, so the function will stop after 6. Giving a number of
iterations for extra strata is ok, but it will throw an error if you give too few.
falsePos <- falseGrandma(gData_1, relationship = "sP", llrToTest = c(8, 12, 16, 20), itersPerMI = rep(1000, 10), seed = 7, errorType = "Unrel", method = "strat")
The output for stratified sampling is a list of 2. The first item has the overall false positive estimates:
falsePos[[1]]
The second item has the estimates for each stratum:
falsePos[[2]][1:10,]
The columns are:
count_MI
: the number of observed Mendelian incompatibilities (MIs) that stratum corresponds toprob_count_MI
: the probability a pair of the defined relationship (Unrelated in this case)
has that number of MIs observedfalsePosUnrel
: the false positive error rate estimated for that stratum. To get the number of
observed false positives during the simulation, multiply this number by the number of iterations
run for that stratum (in this case, it was 1000)falsePosUnrelSD
: the estimated standard deviation for this stratumWe can see right away that even for our lowest critical value of 8, the 4, 5, and 6 MI strata had 0 false
positives observed and the 3 MI stratum only had 1. So, obviously we need to increase the number
of iterations run for these strata to get a more reliable estimate. An alternative, which may be useful when
your panel is powerful enough that the number of iterations required would be unfeasible, is to use importance
sampling to estimate the false positive error rate. For that, check out the R package CKMRsim
.
Now how about pairs of other relationships? The process and outputs are the same, we just change the
errorType
argument to falseGrandma
. For considering parent-offspring pairs, valid options
for errorType
are: "falseNegative", "Unrel", "Aunt", "HalfAunt"
, and "ParCous"
. For example:
# not run falsePos_Aunt <- falseGrandma(gData_1, relationship = "sP", llrToTest = c(8, 12, 16, 20), itersPerMI = rep(100000, 10), seed = 7, errorType = "Aunt", method = "strat")
Now we've estimated the error rates for this genetic panel and population, and for the sake of the example, let's say we've decided on a critical value of 16. Let's now look for parent-offspring pairs. We've already set up the gmaData object to have baseline individuals (potential parents) and mixture individuals (potential offspring).
assignments <- inferGrandma(gData_1, relationship = "sP", minLLR = 16) assignments
And we have one parent-offspring pair assigned.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.