R/RcppExports.R

Defines functions top_index samp_from_mat read_mendel_outped pairwise_geno_id locus_specific_pairwise comp_ind_pairwise make_matrix_X_l

Documented in comp_ind_pairwise locus_specific_pairwise make_matrix_X_l pairwise_geno_id read_mendel_outped samp_from_mat top_index

# Generated by using Rcpp::compileAttributes() -> do not edit by hand
# Generator token: 10BE3573-1514-4C36-9D1C-5A225CD40393

#' compute the matrix X_l given allele frequencies and kappa
#'
#' Ha! I did this initially using R straight up and even tried to make
#' it reasonably vectorized, but the damn thing took forever and was
#' incredible space-inefficient. Ergo, we are going after it using
#' Rcpp.  The genotypes are ordered in rows and columns in the way
#' as described in the paper.
#' @param p vector of allele frequencies
#' @param kappa a 3-vector of the Cotterman coefficients
make_matrix_X_l <- function(p, kappa) {
    .Call('_CKMRsim_make_matrix_X_l', PACKAGE = 'CKMRsim', p, kappa)
}

#' Compute pairwise relationship measures between all individuals in source and one individual in target
#'
#' More explanation later.
#'
#' @param S "source", a matrix whose rows are integers, with NumInd-source rows and NumLoci columns, with each
#' entry being a a base-0 representation of the genotype of the c-th locus at the r-th individual.
#' These are the individuals you can think of as parents if there is directionality to the
#' comparisons.
#' @param T "target",  a matrix whose rows are integers, with NumInd-target rows and NumLoci columns, with each
#' entry being a a base-0 representation of the genotype of the c-th locus at the r-th individual.
#' These are the individuals you can think of as offspring if there is directionality to the
#' comparisons.
#' @param t the index (base-1) of the individual in T that you want to compare against
#' everyone on S.
#' @param values the vector of genotype specific values.  See the probs field of \code{\link{flatten_ckmr}}.
#' @param nGenos a vector of the number of genotypes at each locus
#' @param Starts the base0 indexes of the starting positions of each locus in probs.
#' @return a data frame with columns "ind" (the base-1 index of the individual in S),
#' "value" (the value extracted, typically a log likelihood ratio), and "num_loc" (the
#' number of non-missing loci in the comparison.)
#' @export
comp_ind_pairwise <- function(S, T, t, values, nGenos, Starts) {
    .Call('_CKMRsim_comp_ind_pairwise', PACKAGE = 'CKMRsim', S, T, t, values, nGenos, Starts)
}

#' Return locus-specific pairwise relationship measures between desired pairs of individuals
#'
#' The idea here is that you can go back and look more closely at the log-likelihood ratios
#' for pairs that are found to look the PO, etc., to see how much each of the different
#' loci are contributing.  More explanation later.
#'
#' @param S "source", a matrix whose rows are integers, with NumInd-source rows and NumLoci columns, with each
#' entry being a a base-0 representation of the genotype of the c-th locus at the r-th individual.
#' These are the individuals you can think of as parents if there is directionality to the
#' comparisons.
#' @param T "target",  a matrix whose rows are integers, with NumInd-target rows and NumLoci columns, with each
#' entry being a a base-0 representation of the genotype of the c-th locus at the r-th individual.
#' These are the individuals you can think of as offspring if there is directionality to the
#' comparisons.
#' @param s a vector of base-1 indexes of the source individual in each pair.
#' @param t a vector of base-1 indexes of the target individual in each pair.  This vector is parallel to s.  So,
#' for example `(s[i], t[i])` designates a pair that you wish to investigate (individual `s[i]` in S and `t[i]` in T)
#' @param values the vector of genotype specific values.  See the probs field of \code{\link{flatten_ckmr}}.
#' @param nGenos a vector of the number of genotypes at each locus
#' @param Starts the base0 indexes of the starting positions of each locus in probs.
#'
#' @return a data frame with columns "indS" (the base-1 index of the individual in S),
#' "indT" (the base-1 index of the individual in S), "locus" (base-1 index of the locus),
#' and "value" (the value extracted, typically a log likelihood ratio).  If the pair is missing that
#' locus it is given as NA_REAL
#' @export
locus_specific_pairwise <- function(S, T, s, t, values, nGenos, Starts) {
    .Call('_CKMRsim_locus_specific_pairwise', PACKAGE = 'CKMRsim', S, T, s, t, values, nGenos, Starts)
}

#' Return every pair of individuals that mismatch at no more than max_miss loci
#'
#' This is used for identifying duplicate individuals/genotypes in large
#' data sets. I've specified this in terms of the max number of missing loci because
#' I think everyone should already have tossed out individuals with a lot of
#' missing data, and then it makes it easy to toss out pairs without even
#' looking at all the loci, so it is faster for all the comparisons.
#'
#' @param S "source", a matrix whose rows are integers, with NumInd-source rows and NumLoci columns, with each
#' entry being a a base-0 representation of the genotype of the c-th locus at the r-th individual.
#' These are the individuals you can think of as parents if there is directionality to the
#' comparisons.  Missing data is denoted by -1 (or any integer < 0).
#' @param max_miss maximum allowable number of mismatching genotypes betwen the pairs.
#' @return a data frame with columns:
#' \describe{
#'   \item{ind1}{the base-1 index in S of the first individual of the pair}
#'   \item{ind2}{the base-1 index in S of the second individual of the pair}
#'   \item{num_mismatch}{the number of loci at which the pair have mismatching genotypes}
#'   \item{num_loc}{the total number of loci missing in neither individual}
#' }
#' @export
pairwise_geno_id <- function(S, max_miss) {
    .Call('_CKMRsim_pairwise_geno_id', PACKAGE = 'CKMRsim', S, max_miss)
}

#' pick the genotypes out of the Mendel output pedigree file to use to compute Q
#'
#' Not sure exactly how I am going to do this, as we need to include genotyping error on there as well.
#' Crap.  But at least I have a start on how to parse this monstrous file. This assumes that the focal pair
#' of individuals are labeled 1 and 2.
#' @param Input the path to the Mendel output file to read in.
#' @param NumA the number of alleles at each locus
#' @param verbose integer flag.  1 gives verbose output. 0 does not.
#' @examples
#' \dontrun{
#' read_mendel_outped("/Users/eriq/Desktop/mendel-example-Ped.out")
#' }
read_mendel_outped <- function(Input, NumA, verbose) {
    .Call('_CKMRsim_read_mendel_outped', PACKAGE = 'CKMRsim', Input, NumA, verbose)
}

#' Sample 1 observation from cell probabilities that are columns of a matrix
#'
#' Takes a matrix in which rows sum to one. For each row, performs a
#' single multinomial draw from amongst the columns, weighted by their values in that column
#'
#' @param M a matrix whose rows are reals, each summing to one
#'
#' @return a vector length = \code{nrow(M)} of indices, with each element being
#' the column that was chosen in that row's sampling
samp_from_mat <- function(M) {
    .Call('_CKMRsim_samp_from_mat', PACKAGE = 'CKMRsim', M)
}

#' return indices of the top n elements of a large vector
#' @keywords internal
top_index <- function(x, n) {
    .Call('_CKMRsim_top_index', PACKAGE = 'CKMRsim', x, n)
}
eriqande/CKMRsim documentation built on Aug. 2, 2024, 7:23 a.m.