#' Segregate segments down genomic simulation pedigrees
#'
#' Given a collection of genomic simulation pedigrees and requests
#' for how many simulations should be done (in the `request` input),
#' as well as recombination rates, this simulates the segregation
#' of segments down through the pedigrees
#' @param request a tibble with list columns "gpp" and "reppop".
#' Each element of the gpp column is a tibble giving a genomic simulation
#' pedigree as documented as the input for `prep_gsp_for_hap_dropping()`.
#' Each element of the "reppop" column is a tibble with columns
#' `index`, `pop`, `group`, to indicate which of the founding
#' populations ("A", "B", etc.) correspond to the different groups
#' (from the `group` column in, for example, the meta data for individuals
#' in your genotype data set, like the data object `I_meta`).
#' Because it is quite likely that you might wish to iterate
#' the segregation procedure multiple
#' times in a single simulation, you can specify that by doing multiple
#' "reps" (replicates) of the procedure. **BIG NOTE**: The values in the index column that you
#' choose must start at 1 and should be dense within. In other words, if
#' the max value in the index column is N, then every integer from 1 to N
#' must be in there.
#' @param RR the recombination rates in the format of the package data
#' @param MM the marker meta data tibble (like M_meta). If this is NULL (the default) that
#' is fine. If not, then it uses the order of the markers in MM to define
#' the levels of a chrom_f column so that we can sort the rows of the output
#' correctly, with respect to markers in the Genotype data frame. This will
#' let us more efficiently subscript the markers out of the matrix. If MM is
#' not present, then the function will create `chrom_f` by using the
#' order of the chromosomes from RR. If MM is not NULL, then the function
#' will also check to make sure that the markers are within the extent of the
#' recombination rate bins, giving an error otherwise.
#' @return The output from this function is a tibble. Each row represents one segment of genetic
#' material amongst the sampled individuals from the genomic permutation pedigrees. The columns give
#' information about the provenance and destination of that segment as follows.
#' Each segment exists in one of the samples (`samp_index`) from a sampled individual
#' with a `ped_sample_id` on a given `gpp` (the index giving the row of the request input tibble)
#' in a given `index` within the individual. Further, it is on one of two gametes
#' (`gamete_index`) that segregated into the individual, and it came from a certain founding
#' population (`pop_origin`) that corresponds to the named groups in the genotype file (`group_origin`).
#' And, of course, the segment occupies the space from `start` to `end` on a chromosome `chrom`.
#' Finally, the index of the founder haplotype on the given gpp that this segement descended from is
#' given in `rs_founder_haplotype` which is short for "rep-specific founder haplotype". This final
#' piece of information is crucial for segregating variation from the individuals in the `Geno` file
#' onto these segments. Finally, the column `sim_level_founder_haplo` assigns a unique index for each founder
#' haplotype. This is necessary because any simulation can involve multiple gpps and/or indexes of gpps,
#' and the founders in each of those must all be unique within a simulation. so that those haplotypes
#' can all, eventually, be accessed easily out of the genotype matrix.
#' @export
#' @examples
#' # We construct an example here where we will request segregation
#' # down a GSP with two F1s and F1B backcrosses between two hypothetical
#' # populations, A and B.
#' gsp_f1f1b <- create_GSP("A", "B", F1 = TRUE, F1B = TRUE)
#'
#' # We will imagine that in our marker data there are three groups
#' # labelled "grp1", "grp2", and "grp3", and we want to create the F1Bs with backcrossing
#' # only to grp3.
#' reppop <- tibble::tibble(
#' index = as.integer(c(1, 1, 2, 2)),
#' pop = c("A", "B", "A", "B"),
#' group = c("grp3", "grp1", "grp3", "grp2")
#' )
#'
#' # combine those into a request
#' request <- tibble::tibble(
#' gpp = list(gsp_f1f1b),
#' reppop = list(reppop)
#' )
#'
#'
#' result1 <- segregate(request, RecRates)
#'
#' # here we pass it some markers, too
#' result2 <- segregate(request, RecRates, M_meta)
#'
#' result1
#'
#' result2
segregate <- function(request, RR, MM = NULL) {
# ERROR CHECKING
# first make sure that RR is formatted properly
check_rec_rates_formatting(RR)
# then, if MM is not NULL check to make sure that no marker
# positions exceed the chrom lengths
if(!is.null(MM)) {
check_chrom_lengths(MM, RR)
}
# just lapply over the rows of request and bind_rows() the outputs together
idx <- 1:nrow(request)
names(idx) <- idx
ret <- lapply(idx, function(i) {
# check that the reppop is correctly formatted
check_reppop(request$reppop[[i]], i)
# number of reps to do is the maximum number listed in the request
Reps <- max(request$reppop[[i]]$index)
tmp <- drop_segs_down_gsp(GSP = request$gpp[[i]],
RR = RR,
Reps = Reps)
# now, we join the group specification on there
left_join(tmp, request$reppop[[i]], by = c("index" = "index", "pop_origin" = "pop")) %>%
rename(group_origin = group)
}) %>%
bind_rows(.id = "gpp") %>%
mutate(gpp = as.integer(gpp))
# ERROR CHECKING on the result
# any missing groups? etc.
ret2 <- ret %>%
ungroup() %>%
sim_level_founder_haplos() # on this line we also compute the simulation-level founder haplos!
if(!is.null(MM)) {
ret_tmp <- ret2 %>%
mutate(chrom_f = factor(chrom, levels = unique(MM$chrom)))
} else {
ret_tmp <- ret2 %>%
mutate(chrom_f = factor(chrom, levels = unique(RR$chrom)))
}
# arrange and return
ret_tmp %>%
select(chrom_f, everything()) %>%
arrange(gpp, index, ped_sample_id, samp_index, gamete_index, chrom_f, start)
}
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.