knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
library(gRandma)

This will walk through an example of estimating error rates for assigning parent-offspring pairs and inferring parent-offspring pairs.

Setup

First, we make a gmaData object. For details, see the "Load_in_data_and_gmaData_structure" vignette.

# adding a simulated parent-offspring pair for demonstration
set.seed(7)
tempData <- data_mh_snp[data_mh_snp$Pop %in% c("Pop_2", "Pop_4"),]
tempOffspGenos <- tempData[tempData$Pop == "Pop_2",][1,3:ncol(tempData)]
for(i in seq(4, ncol(tempData), 2)) tempOffspGenos[1, i - 2] <- sample(tempData[!is.na(tempData[,i]),i], 1)
tempOffspGenos[1, which(is.na(tempOffspGenos[1,])) + 1] <- NA
tempOffspGenos <- cbind(data.frame(Pop = "Pop_4", Ind = "Ind00"), tempOffspGenos)
tempData <- rbind(tempData, tempOffspGenos)

# removing loci with no variation in the subset of pops we are going to use
tRem <- c()
for(i in seq(3, ncol(tempData) - 1, 2)) if(length(na.omit(unique(c(tempData[[i]], tempData[[i+1]])))) < 2) tRem <- c(tRem, i)
tempData <- tempData[,-c(tRem, tRem + 1)]
# creating a gmaData object
gData_1 <- createGmaInput(baseline = tempData[tempData$Pop == "Pop_2",], 
                                  mixture = tempData[tempData$Pop == "Pop_4",-1], 
                                  perAlleleError = .005, dropoutProb = .005,
                                          markerType = "microhaps", alleleDistFunc = (function(x) return(1)))

Error rate estimation

False negative error rates

We are first going to estimate false negative error rates. This is the probability that a true grandparent-grandchild trio fails to assign. Here we are testing critical values ($c$ in the manuscript) of $0, 2, ..., 24$.

falseNegative <- falseGrandma(gData_1, relationship = "sP", 
                                        llrToTest = seq(0, 24, 2), N = 10000,
                                        seed = 7, errorType = "falseNegative")
falseNegative[[1]][1:10,]

We now have an estimate of false negative error rates for each critical value for our one baseline population.

False positive error rates

Let's first consider false positive error rates for unrelated trios. We can calculate these error rates by stratified sampling. We will run 1,000 iterations in each stratum. This is too few, but we will use it here because it is quick to build the vignette and because it will allow us to demonstrate the main drawback of the stratified sampling approach. If you wanted, you could use a different number of iterations in each stratum to target computational effort. The input, rep(1000, 10), tells it to run 1000 iterations in the first 10 strata (number of MIs from 0 - 9). We know from the false negative output above that it is only going to consider 0 - 6 MIs, so the function will stop after 6. Giving a number of iterations for extra strata is ok, but it will throw an error if you give too few.

falsePos <- falseGrandma(gData_1, relationship = "sP", 
                                        llrToTest = c(8, 12, 16, 20), itersPerMI = rep(1000, 10),
                                        seed = 7, errorType = "Unrel", method = "strat")

The output for stratified sampling is a list of 2. The first item has the overall false positive estimates:

falsePos[[1]]

The second item has the estimates for each stratum:

falsePos[[2]][1:10,]

The columns are:

We can see right away that even for our lowest critical value of 8, the 4, 5, and 6 MI strata had 0 false positives observed and the 3 MI stratum only had 1. So, obviously we need to increase the number of iterations run for these strata to get a more reliable estimate. An alternative, which may be useful when your panel is powerful enough that the number of iterations required would be unfeasible, is to use importance sampling to estimate the false positive error rate. For that, check out the R package CKMRsim.

Now how about pairs of other relationships? The process and outputs are the same, we just change the errorType argument to falseGrandma. For considering parent-offspring pairs, valid options for errorType are: "falseNegative", "Unrel", "Aunt", "HalfAunt", and "ParCous". For example:

# not run
falsePos_Aunt <- falseGrandma(gData_1, relationship = "sP", 
             llrToTest = c(8, 12, 16, 20), itersPerMI = rep(100000, 10),
             seed = 7, errorType = "Aunt", method = "strat")

Assigning parents

Now we've estimated the error rates for this genetic panel and population, and for the sake of the example, let's say we've decided on a critical value of 16. Let's now look for parent-offspring pairs. We've already set up the gmaData object to have baseline individuals (potential parents) and mixture individuals (potential offspring).

assignments <- inferGrandma(gData_1, relationship = "sP", minLLR = 16)
assignments

And we have one parent-offspring pair assigned.



delomast/gRandma documentation built on March 8, 2024, 2:26 a.m.